Scale-Aware Multispectral Fusion of RGB and NIR Images Based on Alternating Guidance

In low light condition, color (RGB) images captured by increasing the camera ISO contain much noise and detail loss. However, near infrared (NIR) images are robust to noise and have clear textures without color. In this paper, we propose scale-aware multispectral fusion of RGB and NIR images based on alternating guidance. Low light RGB images provide large-scale image structure and color information, while NIR images have fine details lost in RGB images. Since they are complementary, we adopt alternating guidance for the fusion of them using weighted least squares (WLS). First, we perform the first guidance to denoise the RGB image and obtain base layer. Then, we conduct the second guidance for scale-aware detail transfer of the NIR image and yield detail layer. Finally, we combine the base and detail layers to generate a fusion image. We maximize the multispectral advantage of RGB and NIR images for fusion based on alternating guidance. Experimental results show that the proposed method achieves good performance in noise reduction, detail transfer and color reproduction, and is superior to the state-of-the-art ones in terms of quantitative measurement and computational efficiency.


I. INTRODUCTION
With advances in the sensor technology, the image types are highly diversified. In addition to the widely used visible RGB cameras, there exist depth cameras to record depth information, infrared (IR) and NIR cameras for invisible wavelength band imaging, and X-ray cameras for medical imaging. Due to the increasing demands for computer vision applications, the requirements for imaging quality are becoming higher and higher. To maximize the advantages of various sensors, image fusion uses multiple sensors to improve imaging quality and accuracy of vision applications [20], [28]. Recent image fusion methods include fusion of flash/ no-flash images [19], infrared/RGB images [17], near infrared (NIR)/RGB images [13], [21], [34]. In this paper, we focus on the multispectral fusion of RGB and NIR images in low light condition.
The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Zhao .

A. RELATED WORK
In low light condition, it is hard to capture high quality RGB images without flash due to low signal-to-noise ratio (SNR). A commonly used method is to increase the camera ISO setting in low light condition. However, the quality of RGB images is seriously degraded by noise at a higher ISO value, and thus the edges and details are severely destructed. IR band imaging acquires images stably in adverse environments such as low light. It is widely used in object recognition [25], object detection [5], video surveillance [9], and remote sensing [24]. Although IR images have the advantage of resisting unfavorable environments, they generally have low resolution with poor textures. These defects limit IR images in certain applications requiring high-quality images, such as night vision systems. However, in the NIR band (750-1400 nm) close to the visible band [2], the NIR images have high resolution, clean textures, and robustness to noise. Therefore, compared with IR images, NIR images are more suitable for night vision systems that require higher image quality such as video surveillance [22]. Although NIR images do not have color information, it is easy to simultaneously acquire RGB  [30], and the proposed method. We capture them by JAI AD-130 GE camera which is able to simultaneously capture RGB and NIR images through the same optical path with two CCDs. and NIR images which are complementary through hybrid camera systems. Therefore, acquiring a fusion image with color information and clean details through the fusion of RGB and NIR images provides a solution to high-quality imaging in low light condition.
Prior to introducing RGB and NIR image fusion, we first review RGB and IR image fusion. The research on RGB and IR image fusion has a long history, which can be divided into seven categories: multiscale transform [10], [18], [35], sparse representation [11], [28], neural networks [8], [29], subspace methods [1], [7], and saliency-based methods [33], [36], hybrid models [12], [15], and other methods [14], [37]. NIR images provide higher resolution and better details than IR ones in low light condition. Therefore, the fusion of RGB and NIR images is more suitable for producing high-quality fusion results in low light condition. However, due to the difference in contrast and structure between the luminance channel of the RGB image and the NIR image, the NIR image cannot be directly replaced with the luminance channel of the RGB image. The direct replacement of the luminance channel by the NIR image leads to the color distortion and structural destruction. To solve the contrast difference, Son [26] proposed a method for low-light color image denoising based on contrast conversion between NIR images and luminance channels. Son et al. [27] further proposed an NIR coloring method using a contrast-preserving mapping model. To successfully preserve structural information of RGB and NIR images, Shibata et al. [23] proposed a fusion method based on high visibility area selection. Yan et al. [30] explicitly modelled derivative-level confidence and proposed cross field joint image fusion by optimizing a scale map.

B. CONTRIBUTIONS
In this paper, we propose scale-aware fusion of RGB and NIR images in low light condition based on alternating guidance. Noisy RGB images contain color information and big-scale image structures, while NIR images include small-scale fine textures. We adopt alternating guidance for fusion of RGB and NIR images based on WLS to make full use of their own advantage. First, we perform the first guidance for denoising on noisy RGB image and obtain base layer. For the first guidance, the joint guidance of NIR image and denoised luminance of RGB image was employed to remove noise while retaining edges. Since the NIR imaging highly depends on the NIR light strength, NIR images have no or very small values beyond the range that the NIR light reaches. This usually happens in the outdoor environment, especially for the background at night time, which makes the noisy RGB information also useful for fusion (see Fig. 1). Thus, we perform sigmoid-based NIR weighting for base layer generation (BLG) to achieve different smoothing degrees according to the NIR intensity. Then, we conduct the second guidance for scale-aware detail transfer on NIR image and obtain detail layer. Compared with direct smoothing on NIR image, the second guidance is able to generate more complete smallscale textures lost in noisy RGB image. Finally, we combine base and detail layers to produce a fusion result. As shown in Fig. 1, the proposed method successfully transfers details from the NIR image to the fusion result with noise removal and color reproduction. We capture them in low light condition by JAI AD-130 GE camera. This camera is able to simultaneously capture RGB and NIR images through the same optical path with two CCDs 1 . As shown in the figure, RGB images captured in the low illumination condition have color with severe noise and detail loss. However, NIR images captured in the same condition have fine details with little noise. Thus, they are complementary, and the fusion of them is able to take both advantages. The preliminary result of this paper was presented in [38]. In this paper, we extend our previous work in the following three points. First, we perform sigmoid-based NIR weighting to selectively take both NIR textures (foreground) and RGB information (background) in fusion. Second, we remove NIR highlights in fusion by the joint guidance of NIR image and Y channel (RGB image). Third, we capture real image pairs by JAI AD-130 GE camera, and verify the effectiveness of the proposed method on them.
Compared with existing methods, main contributions of the proposed method are as follows: • We propose scale-aware fusion of RGB and NIR images based on alternating guidance to make full use of the multispectal advantage. Low light RGB images contain big-scale image structures with much noise, while NIR images include small-scale fine textures without color.
Since they are complementary, we adopt alternating guidance to achieve scale aware fusion of the paired images.
• We adopt WLS to alternately use RGB and NIR images as guidance for noise removal, detail transfer and NIR highlight removal. WLS is an edge-aware smoothing filter based on global optimization, which is used in the first guidance (RGB image denoising) to get base 1 https://www.jai.com/products/ad-130-ge layer and in the second guidance (NIR texture transfer) to obtain detail layer. We combine the base and detail layers to generate a fusion image.
• We achieve high computational efficiency with O(N ) time complexity by WLS-based fast global smoothing. Fig. 2 shows the whole framework of the proposed scaleaware RGB/NIR image fusion based on alternating guidance. We adopt weighted least squares (WLS) for fast global smoothing that is a global edge preservation filter. We perform scale-aware alternating guidance for fusion of RGB and NIR images based on WLS. We use the combination of NIR image and denoised RGB luminance channel as joint guidance to get base layer in the first guidance. The first guidance effectively removes noise while protecting structure of RGB image. We utilize an over-smoothed RGB image as guidance to obtain detail layer in the second guidance. Since this over-smoothed RGB image only provides big-scale structure of the RGB image, the details of different scales and contrasts lost in the RGB image are completely taken from the NIR image by guiding the NIR image smoothing. The second guidance successfully takes clear texture from NIR image. Finally, we combine base and detail layers to generate fusion image.

A. SIGMOID-BASED NIR WEIGHTING
To effectively aggregate the multispectral information, we perform independent processing of the foreground and background segmented by the NIR intensity. Since the NIR imaging highly depends on the NIR intensity, the NIR image contains rich information within the range that the NIR light can reach. However, the NIR intensity has little or no information beyond the range. Thus, we mainly use the NIR information within the range, while we mainly utilize the RGB information above the range. To achieve different smoothing VOLUME 8, 2020 degrees to the NIR intensity, we perform sigmoid-based NIR weighting as follows: where i represents the NIR intensity, ε is a parameter that adjusts the steepness of the curve, and τ is a parameter that controls the threshold.

B. WEIGHTED LEAST SQUARES
WLS is an edge-aware smoothing filter based on global optimization formulation [16], which consists of a data term and a prior term. The prior term is represented for smoothing by a weighted L 2 norm. Given an input image f and a guidance image g, an output image u is obtained by minimizing the following WLS energy function: where N (p) represents a set of four adjacent pixels of P; λ controls the balance between data and smoothing terms, and increasing λ results in smoothing output; and ω p,q (g) is the weight calculated from the guidance image f and measure the similarity between pixels p and q. ω p,q (g) is defined as follows: where σ is a range parameter. The energy function in Eq. (1) is transformed into a vector form as follows: where u and v denote S × 1 column vectors containing values of u and v, respectively, and S is the total number of pixels; T denotes the transposition; and A g is S × S Laplacian matrix as follows [4]: Based on a large sparse matrix, this energy function can be solved through a linear system as follows: However, solving it by matrix inversion is of high computational complexity. By fast global smoothing of WLS, the time complexity can reach O(N ). First, we consider onedimensional (1D) case assuming that WLS energy function works on a 1D horizontal input signal f h and a 1D guiding signal g h along x dimension (x = 0, . . . , W − 1). The energy function of the 1D signal is as follows: where N h (x) represents two neighbors of x. This energy function is minimized by the following linear equation: where I h is an identity matrix with a size of W × W ; u h and f h represent the vector notations of u h and f h , respectively; A h is a three-point Laplacian matrix with a size of W × W . The linear system in Eq. (7) is written as follows: where u h x and f h x are the x th elements of u h and f h , respectively; a x , b x , and c x represent three nonzero elements in the x th row of (I h + λ t A h ). In boundary condition, a 0 = 0 and c W −1 = 0. a x , b x and c x are written as: Matrix (I h + λ t A h ) is a tridiagonal matrix whose nonzero elements exist only in the left and right diagonals. By Gaussian elimination, it reaches O(N ) complexity. In Gaussian elimination, intermediatec x andf h x are computed as follows: To process a two-dimensional (2D) image signal by using 1D solver, we perform 1D global smoothing operations along each dimension of 2D signal. To prevent the streaking artifact which commonly appears in separable algorithms [3], we perform 2D smoothing by applying sequential 1D global smoothing to a multiple number of iterations [16]. In this scheme, λ t in each iteration is computed as follows: where T represents the total number of iterations along each dimension. In each iteration, we perform 1D solver with parameter λ t along x dimension and y dimension of 2D images continuously.

C. FIRST GUIDANCE FOR NOISY RGB DENOISING
We perform the first guidance for RGB denoising and obtain the base layer. For the first guidance, we utilize both NIR image and denoised luminance channel of RGB image. NIR images contain image structure with clean textures and little noise. Thus, using NIR image as guidance achieves good denoising performance on noisy RGB image. However, due to the contrast difference between NIR image and RGB luminance channel in Fig. 3, direct adoption of NIR image as guidance would ruin some edges of RGB image when the contrast of NIR image is not matched with that of RGB luminance channel. Moreover, using RGB image itself as the guidance (self-guidance) can not use NIR clean textures for denoising. Thus, we adopt both NIR image and denoised luminance channel for guidance to remove noise while keeping image structure [32]. To remove salt-and-pepper noise in Y I , we use fast weight median filter (FWMF) and get Y n . Then, we perform element-wise addition of Y n and NIR image N I to obtain the guidance for denoising noisy RGB image C I . The base layer C B is obtained by minimizing the following energy function: (14) where C I represents column vectors containing values of C I ; A G n denotes Laplacian matrix defined by Y n + N I ; and range parameter σ is σ 2 . Fig. 4 shows RGB denoising results by the first guidance. Guided by C I , i.e. self-guidance, the first guidance can not achieve satisfactory denoising performance (see Fig. 4(b)). Guided by N I , i.e. NIR image, the first guidance makes the picture much blurry (see Fig. 4(c)). When only NIR image is used as guidance, the contrast difference between RGB and NIR images such as red boxes in Fig. 3 would cause serious blurs in edges. Guided by Y n + N I , the first guidance successfully removes noise while preserving the structure of RGB image (see Fig. 4(d)).

D. SECOND GUIDANCE FOR SCALE-AWARE DETAIL TRANSFER
In low light condition, RGB images often lose details due to the severe noise. Thus, we use the input NIR image to recover the lost details in the noisy RGB image. To transfer multiscale details of the NIR image, we use the RGB image with the big-scale structure to guide the NIR detail transfer, called scale-aware detail transfer. We obtain the second guidance image by over-smoothing the base layer of the input RGB image to contain the big-scale structure. Therefore, based on the second guidance, the details of multiple scales and contrasts in the NIR image are successfully transferred. As shown in Fig. 5(a), the black texts on the tea box in noisy RGB image are mostly destructed and only their color is identified. In contrast, they are very clear in NIR image. However, the lost details in noisy RGB image have various scales and textures as shown in Fig. 5(a) (see the red box in NIR image). In practice, details in NIR image have various scales and textures from small to large, and thus it is inappropriate to apply the same smoothing to NIR image as the second guidance for detail transfer. Directly applying smoothing to NIR image, i.e. self-guidance, causes loss of some textures and color distortion. If we utilize small scale smoothing on NIR image as the second guidance, large scale textures would not be transferred (see Fig. 5(b)). On the contrary, if we use large scale smoothing on NIR image as the second guidance, most details are smoothed so that some unwanted details are transferred (see Fig. 5(c)). The unwanted details would cause serious color distortion in fusion results compared with the input color image. Thus, we use over-smoothed RGB image, C S , as the second guidance for detail transfer. We obtain C S from base layer C B , i.e. the output of the first guidance. Guided by C S , we successfully generate detail layer N D from input NIR image N I . Since C S maintains large scale structure of RGB image, the second guidance acquires details with VOLUME 8, 2020  various scales from NIR image that are lost in noisy RGB image (see Fig. 5(d)). For the second guidance, we first obtain C S from C B using WLS as follows: (15) where C B represents vector norm of C B ; A C B denotes Laplacian matrix defined by C B ; and the range parameter σ is σ 3 . By minimizing the energy function Eq. (15), we obtain C S . Then, we obtain the smoothed NIR image N S under the guidance of C S by minimizing the following energy function: where N I represents vector norm of N I ; A C S denotes Laplacian matrix defined by C S , and the range parameter σ is σ 4 . We acquire N D through a pixel-wise subtraction between N I and N S .

E. REMOVAL OF NIR HIGHLIGHTS
In NIR images, highlights often appear in the human eyes. They usually happen at night because the pupils dilate to allow in more light. Much of the NIR light passes into the eye through the pupils, and then it is reflected through them. The NIR camera records this reflected light. This is very similar to the red-eye effect in flash photography. Fig. 6 shows the NIR highlights that appear in the human eyes. To suppress them, we perform the second guidance using the image smoothed by joint NIR image and Y channel of C S , instead of the smoothed base layer C S . For a comparison, we provide the fusion results by two smoothed images. As shown in Fig. 7, the second guidance by the joint guidance of NIR and Y channel of C S successfully removes the NIR highlights of the human eyes in fusion.

F. LAYER FUSION
We reconstruct fusion image by combining base layer C B and detail layer N D . First, we convert C B to YUV color space as follows: where R, G, B represent red, green and blue channels of C B , respectively. We use YUV color space to transfer the NIR details to the fusion without color shift. Then, we combine Y B and N D through pixel-wise add operation to generate the fused luminance channel Y as follows: Finally, we convert Y , U B , V B to the RGB color space to generate fusion RGB image C O as follows:

III. EXPERIMENTAL RESULTS
For experiments, we use synthetic image pairs in Figs. 8-10 and real image pairs in Figs. 11-13. The synthetic image pairs are indoor scenes, while the real image pairs are outdoor scenes. We synthesize low light RGB images in the synthetic image pairs (Figs. 8-10) by adding Gaussian noise and saltand-pepper noise into the clean RGB images in a publicly available dataset [2]. Moreover, we capture real image pairs in the real image pairs (Figs. 11-13) using JAI AD-080GE camera at night time. This camera can capture RGB and NIR images simultaneously using the same optical path with two CCDs. In the real image pairs, the RGB images are heavily corrupted by noise with severe loss of details. Thus, they are much more degraded and challenging than the synthetic ones for the fusion. We perform our experiments on a PC with Intel i5-6500 CPU (3.2GHz) and 16GB RAM using Matlab 2015b and C++.

A. PARAMETER SETTING
In the proposed method, we use WLS four times for alternating guidance, i.e. images denoising and detail transfer. Moreover, we utilize fast weight median filtering (FWMF) [32] to remove salt-and-pepper noise. We set the number of WLS iterations to T = 3 for the balance of efficiency and quality.

B. PERFORMANCE EVALUATION
To verify effectiveness and efficiency of the proposed method, we compare the proposed method with preserves color information of RGB images by preventing color shift in fusion.
For quantitative measurements, we evaluate performance on the fusion results in terms of blind image quality assessment (BIQA) [31]. As an evaluation metric, we adopt a no-reference metric for quality assessment, BIQA, because neither RGB nor NIR images are used as reference images. Table 1 shows BIQA scores for different methods on the test image pairs. Smaller scores represent better performance. Bold and underlined numbers indicate the best and second performance, respectively. The proposed method achieves the minimum BIQA scores and outperforms the others in average performance. Furthermore, the proposed method achieves higher computational efficiency than the others. The time complexity of our method reaches O(N ) by fast global smoothing of WLS [16]. We estimate their runtime on our   testing dataset (total 29 pairs of RGB and NIR images) with resolution of 1920 × 1080. Table 2 lists the average runtime (unit: sec/pair). Among them, Yan et al.'s work [30] achieves comparable performance in fusion to ours, however, the proposed method is 30+ times faster. This is because we use fast global smoothing for fusion of RGB and NIR images based on WLS.

IV. CONCLUSION
We have proposed scale-aware multispectral fusion of RGB and NIR images based on alternating guidance. Low light RGB images contain colors with coarse image structure, while NIR images include clean textures without color. We have adopted scale-aware multispectral fusion to contain multiscale structures of RGB and NIR images in the fusion results. Moreover, we have used alternating guidance based on WLS to maximize the multispectral advantage. We have utilized joint guidance of NIR image and denoised RGB luminance in the first guidance to remove noise while keeping edges. We have used an over-smoothed RGB image as guidance in the second guidance to perform scale-aware detail transfer of NIR image to fusion. Experimental results show that the proposed fusion method achieves good performance in noise removal and detail transfer with high computational efficiency.