Hyperspectral and Panchromatic Image Fusion via Adaptive Tensor and Multi-Scale Retinex Algorithm

Fusion of panchromatic image (PANI) and hyperspectral image (HSI) to obtain an output image with high spatial and spectral resolutions has received increasing interests recently. We propose a new image fusion method for HSI and PANI by combining adaptive tensor with a multi-scale Retinex algorithm in this paper. In the proposed method, an adaptive tensor based method is presented to effectively extract the structure information of HSI, and multi-scale Retinex algorithm is introduced to obtain the spatial and structure details of PANI. To integrate spatial structure information, a gradient-based weighted fusion strategy is proposed to combine spatial details of HSI and PANI. The integrated structure details are injected to generate the fused HSI. Experiments using both simulated and real remote sensing data sets demonstrated that the proposed fusion algorithm performs better than the state-of-the-art algorithms in visual inspection and objective assessment.


I. INTRODUCTION
Due to the technical constraints, information collected by a single sensor only reflects partial characteristics of an object, but cannot reflect complete characteristics. Multiple sensors can reflect more complete characteristics and information. The hyperspectral (HS) remote sensing sensors provide the HS imagery (HSI) which has abundant spectral information [1]. The Panchromatic (PAN) remote sensing sensors are capable of providing the PAN image (PANI) that possesses high spatial resolution (HSR). The HSI is a three-dimensional data, and has been used in various fields [2]- [5]. However, the HSI usually has low spatial resolution (LSR). Similarly, the PANI contains limited spectral information. The plentiful spatial information provided by the PAN images (PANIs) is helpful to locate the objects accurately [6]. The HS The associate editor coordinating the review of this manuscript and approving it for publication was Michele Nappi . images (HSIs) that have high spectral resolution are utilized to recognize the materials [6]. For the sake of combining the aforementioned advantages of the two images, the PANI and HSI fusion is necessary and significative.
Generally, CS algorithms decompose the HSI into spectral and spatial components, substitute spatial information with PANI, and construct the fused HSI via using the inverse transformation. These algorithms generally have the advantages that they are fast and simple, and have excellent spatial performance [14]. However, they usually have serious spectral distortion [17].
Bayesian and matrix factorization methods which are proposed in this decade generally yield satisfactory fusion results, but these methods usually take high computational cost [18]. Bayesian approaches contain methods such as Bayesian HySure [18], Bayesian sparse (BSF) [19], [20], and Bayesian naive [21]. These methods employ a appropriate prior distribution to solve a optimized model [6]. Matrix factorization based methods such as constrained nonnegative matrix factorization (CNMF) [22] usually utilize nonnegative matrix factorization (NMF) [23], [24] to obtain the fused image.
In addition, some advanced and innovative ideas for the fusion processing have been proposed recently [25]- [32]. Among them, the more popular fusion methods are based on deep learning [27]- [31]. Dian et al. learned the image priors through convolutional neural network (CNN)-based residual learning to sharpen the HSI [28]. Shao et al. adopted a two branches CNN network to extract features of HSI and PANI, and utilized one main thread to fuse the extracted features for generating the fused HSI [29]. Zhang et al. presented an endto-end bidirectional pyramid network [31], and this network processed HSI and PANI in two separate branches level by level.
Structure tensor matrix at a pixel has two eigenvalues which describe the spatial structure information at this pixel. The larger eigenvalue represents the edge intensity of this pixel. Multi-scale Retinex algorithm can decompose an image into shape-dependent and reflectance components. Based on the idea of tensor matrix and Multi-scale Retinex algorithm, we present a new HSI and PANI fusion method, where an adaptive tensor based algorithm is presented to extract the details of HSI. Meanwhile, three-scale Retinex algorithm is introduced to obtain the structure details of PANI. Then, the proposed approach presents a gradient based weighted way to acquire the integrated spatial details of HSI and PANI. The validity and effectiveness of the proposed approach is verified by testing experiments and comparing with other hyperspectral remote sensing image fusion methods.
The rest of the paper is organized as follows. Section 2 introduces related work. Section 3 provides the details of the proposed method. Section 4 displays experimental results. Finally, Section 5 gives the conclusions.

II. RELATED WORK
A. STRUCTURE TENSOR Structure tensor [33] which represents the gradient and structure details of an image has been successfully used in various applications [34]- [38]. For an image U, structure tensor J is defined as where ∇U = U x U y T is the gradient operator, U x = ∂U ∂x and U y = ∂U ∂y are x and y derivatives of U, and () T is transpose operation. To consider the structure information of neighborhood at this pixel, the Gaussian function is convoluted with J as where J G represents the resulting tensor matrix, G τ represents Gaussian function, τ represents standard deviation, and * represents the convolution operation. Since J G is a positive semi-definite matrix, it has two nonnegative eigenvectors and can be decomposed as where ν 1 and ν 2 are the eigenvectors, and β 1 and β 2 are the corresponding nonnegative eigenvalues. The eigenvalues β 1 and β 2 describe the spatial structure information. If β 1 ≈ β 2 ≈ 0, the region is flat area. When β 1 > β 2 ≈ 0, the region is edge area. If β 1 ≥ β 2 > 0, the area is a corner. The larger eigenvalue indicates the edge intensity of the image at this pixel. The corresponding eigenvector is the gradient direction of this pixel.

B. INTRINSIC IMAGE DECOMPOSITION AND RETINEX ALGORITHM
Intrinsic image decomposition (IID) has been used in many image processing problems [39], [41]. Given an image, the IID is represented as where O is an input image, S is an illumination and shape-dependent component, and R is a reflectance component. The reflectance component R is related to the material of objects, and the illumination shading information S depends on the illumination of the scenes and the texture and structure information of objects. In [41], IID is utlized to obtain reflectance and illumination components of HSI, where R is extracted as spectral component of HSI, and S is served as spatial and texture structure details of HSI. IID is introduced to PANI in this paper, and the illumination and shape-dependent component S that is the spatial and structure information is extracted. How to solve the intrinsic image decomposition equation is a challenging topic. Various methods have been proposed to solve this ill-posed equation, such as Retinex algorithm [42], [43], global optimization method [44], and extra constraints method [45]. Retinex algorithm can effectively show the detail information, decompose the reflectance and illumination components, and compress the dynamic range. Compared with the single scale Retinex algorithm, multi-scale Retinex algorithm is more effective in detail contrast and extraction aspects [43]. In this paper, multi-scale Retinex algorithm is adopted for the intrinsic decomposition of PANI.

III. PROPOSED METHOD
This section describes the proposed method that are based on adaptive tensor and multi-scale Retinex algorithm. Fig. 1 shows its schematic diagram. Let H ∈ R l×w×d denote the LSR HSI, and P ∈ R L×W ×1 denote the HSR PANI. When H ∈ R l×w×d and P ∈ R L×W ×1 are given, these two images can be fused to generate a new HSI which has high spatial and spectral resolution. Let H ∈ R L×W ×d denote the fused HSI. Here, l and L (note that l < L) are the respective heights of two images, w and W (note that w < W ) are the respective widths, and d is the number of the HSI bands. The objective of HSI and PANI fusion is to effectively improve spatial resolution while preserving the spectral information of the original HSI.

A. ADAPTIVE TENSOR BASED METHOD FOR HSI
The original HSI H is interpolated to generate the same dimension as PANI. Let H ∈ R L×W ×d represent the interpolated HSI. An adaptive tensor based algorithm is presented to extract the spatial information of HSI. Based on Equation (1) and (2), for each pixel of each band, the structure tensor matrix is obtained, and the Gaussian function G τ is applied to the obtained structure tensor as . . , L, y = 1, 2, . . . , W , H m x,i and H m y,i are the x and y partial derivatives at pixel i of the mth band, J m i is the tensor matrix at pixel i of the mth band, J m i is the convolutional tensor matrix at pixel i of the mth band, and the standard deviation τ is set to 0.5. The convolutional tensor which is a semi-definite matrix can be decomposed as where ν m 1,i and ν m 2,i are the eigenvectors at pixel i of the mth band, and β m 1,i and β m 2,i are the eigenvalues at pixel i of the mth band. Assume that β m 1,i is greater than β m 2,i . The larger eigenvalue β m 1,i indicates the edge intensity at pixel i of the mth band.
The spatial component of HSI is obtained by the following weighted formula where I H ∈ R L×W ×1 denotes the spatial details of HSI, I H,i denotes the spatial information at pixel i, H m i denotes the intensity of the interpolated HSI at pixel i of the mth band, and a m i represents the weighting coefficient at pixel i of the mth band. The weighting coefficient a m i is adaptively acquired by analyzing the larger eigenvalue β m 1,i . Since β m 1,i describes the edge intensity at pixel i of the mth band, the greater β m 1,i , the more spatial information at pixel i of the mth band. Therefore, the weighting coefficient a m i is adaptively obtained.

B. MULTI-SCALE RETINEX ALGORITHM FOR PANI
For the sake of enhancing the spatial information, Laplacian of Gaussian (LoG) approach is applied to PANI. The LoG method employs a Gaussian lowpass filter to reduce noise, and utilizes a Laplace operator to sharpen the details. The obtained result is finally added with the original PANI.
where P ∈ R L×W ×1 denotes the enhanced PANI, * denotes the convolution operator, and f L denotes the function of LoG operator which is defined as where σ denotes the standard deviation. In Equation (10), h is a constant related to the central coefficient of the LoG kernel. If the central coefficient is positive, h is equal to 1. Conversely, h is −1. In this work, the coefficient is a negative value, and h is −1.
To decompose the illumination and reflectance components of the enhanced PANI, Multi-scale Retinex algorithm is introduced. According to [41], the illumination and shape-dependent component describes spatial and structure information. Based on Equation (4), the intrinsic image decomposition of P is represented as where S P ∈ R L×W ×1 represents the illumination and shape-dependent component of the enhanced PANI, and R P ∈ R L×W ×1 is the reflectance information of the enhanced PANI.
On the basis of the multi-scale Retinex algorithm, three scales which are high, medium and low are selected, and the reflectance component which is related to the materials is estimated as r P (x, y) = log(R P (x, y)) = 1 3 3 n=1 log( P(x, y)) − log( P(x, y) * g n (x, y)) (13) where r P is the log form of R P , and g n is the Gaussian surround function which can be expressed as s.t g n (x, y)dxdy = 1 (14) for n = 1, 2, 3,, where ω n represents the scale factor of g n , ω 1 , ω 2 , ω 3 are set to 16, 32, and 64. Q n which is determined by constraint condition is the normalization factor. In Equation (13), the convolution part P(x, y) * g n (x, y) is calculated in the frequency domain.
P(x, y) * g n (x, y) = −1 ([ ( P(x, y))] × [ (g n (x, y))]) (15) where denotes Fourier transform and −1 denotes inverse Fourier transform. The reflectance component which depends on the intrinsic nature and materials of objects is obtained by using Equation (13), and then the illumination and shape-dependent component that describes the texture and structure information is acquired by

C. GRADIENT BASED WEIGHTED FUSION STRATEGY
After the spatial details of HSI I H and the structure information of PANI S P are extracted, a gradient based weighted fusion strategy is presented to obtain the integrated spatial details. Gradient operation is applied to I H and S P . A larger gradient value at each pixel indicates more spatial information. This gradient based weighted fusion strategy is described as  (17) where D ∈ R L×W ×1 represents the integrated spatial details, D i represents the spatial information at pixel i, I Hx , I Hy , S Px , and S Py are the x and x partial derivatives of I H and S P , respectively, and I Hx,i , I Hy,i , S Px,i , and S Py,i are their values at pixel i. By this weighted strategy, the spatial information of HSI and PANI is considered simultaneously. In order to inject the integrated spatial information D into the interpolated HSI H with less spectral and spatial distortion, a gains matrix T is constructed as follows. The proportion between each pair of HSI bands remains unchanged to maintain spectral information. Then, a tradeoff parameter λ is defined to control the amount of the injected structure information, which reduces spatial distortion. Define where T m is the gains matrix of the mth band. The fused HSI with high spectral and spatial resolution is generated by combining the integrated spatial details with the interpolated HSI for each band as

A. DATA SETS
For the sake of assessing the performance of the proposed approach (named as ATMR), three synthetic data sets and one real data set are used in the experiments. Three synthetic data sets include Salinas, Moffett field, and Washington DC, and the real data set is Hyperion. Several characteristics of these four tested data sets are summarized in Table 1.
Moffett field data set was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), and is a standard data product of AVIRIS [6]. These HSIs have 224 bands ranging from 0.4 − 2.5µm, and 176 bands are used for experimentation after removing the noise bands and water absorption bands. The spatial resolution of PANI and HSI are 20m and 100m. The tested PANI and HSI are of size 250 × 150 pixels and 50 × 30 pixels.
Salinas data set is another standard data product which was collected by AVIRIS [6]. This data set includes 224 bands covering the spectral range 0. For the synthetic data sets, including Salinas, Moffett field, and Washington DC, the HSIs are available, and the PANIs are not available. The available HSI is served as the reference HSI. Based on Wald s protocol [46], the synthetic HSI and PANI are generated from the available HSI. The reference HSI is spatially blurred by applying a Gaussian kernel and downsampled by a factor of 5 to create the simulated LSR HSI. The simulated high resolution PANI is created via averaging the visible range bands of the reference HSI.
Hyperion data set was provided by the Hyperion instrument which is carried on the EO-I spacecraft [6]. The EO-I spacecraft also carried another instrument that was Advanced Land Imager (ALI) [6]. The HSIs that were collected by Hyperion instrument contain 242 bands covering spectral range of 0.4 − 2.5µm. The noise and water absorption bands are removed, and 171 bands are utilized to be tested. ALI instrument provided PANI with the spatial resolution of 10 m. The size of PANI and HSI in the experiment are 360 × 360 and 120 × 120.

B. COMPARED METHODS AND FUSION QUALITY METRICS
The proposed ATMR method is compared with several popular hyperspectral remote sensing image fusion methods, including MGH [9], BSF [19], CNMF [23], GS [13], guided filter PCA (GFPCA) [47], GSA [14], an image segmentation-based method (SEGM) [25], and an optimization constraint equation and sliding window-based method (OCSW) [26]. To compare the performance of each approach, several widely used quality metrics are used. The first quality index is cross correlation (CC) [48], which is a spatial quality measure. CC characterizes the geometric distortion, and the maximum value is 1. spectral angle mapper (SAM) index is used for measuring the degree of spectral distortion [48], and the optimal value is 0. The third index is root mean squared error (RMSE). Erreur relative global adimensionnelle de synthse (ERGAS) [49] is the fourth index. The RMSE and ERGAS metrics are global measures which evaluate spatial and spectral qualities respectively, and 0 is their ideal value.
Salinas, Moffett field, and Washington DC data sets are the synthetic data sets. The synthetic data sets use the available HSI serving as the reference HSI. The synthetic HSI and PANI which are generated from the HSR HSI are fused by each method to obtain the fused HSIs. The fused HSIs are compared with the reference HSI to evaluate the objective performance.
Hyperion data set which is a real data set is utilized to evaluate the fusion capability in real hyperspectral image. For the real data set, we generally can not obtain the reference HSI. To assess the objective performance of each method, the original available low resolution HSI is served as the reference HSI. This reference HSI and the real PANI are all degraded to obtain the degraded HSI and PANI on the basis of Wald s protocol [46]. The degraded HSI and PANI are fused to generate the fused HSI, which is compared with the original HSI to assess the objective performance.

C. PARAMETER DISCUSSION
In this part, we analyze the effect of tradeoff parameter λ. λ controls the quantity of the injected spatial information. Since λ controls the spatial distortion, we use the CC, RMSE and ERGAS indexes that evaluate the spatial quality to determine an optimal value of λ. The experiment is performed on the Moffett field data set, and experimental results are shown in Fig. 2. We analyze the results of CC, RMSE and ERGAS, and find that the better results are obtained when λ is set to 0.12. Thus, for the Moffett field data set, we set the value of λ as 0.12. The same way is applied on the Salinas, Washington DC, and Hyperion data sets, and the values of λ for these three data sets are 0.05, 0.07, and 0.05, respectively.

D. EXPERIMENTAL RESULTS WITH THE SIMULATED HYPERSPECTRAL DATA SET
The first and third row of Fig. 3 show the fused HSIs generated by different approaches for the Moffett field data set. Fig. 3(a1) is the reference HSI. Visually, the fused result of the GS approach suffers from spectral distortion, especially in some urban regions. Compared with other approaches, GFPCA looks blurry, and the spatial details of the fused HSI is not sufficient. Although the GSA, SEGH, and CNMF algorithms have good capability in the spatial aspect, their results yield slight spectral distortion. By comparison, the MGH, OCSW, BSF, and ATMR methods acquire better fusion effects. The MGH, OCSW, BSF, and ATMR methods all have good fidelity of the spatial details. However, by further comparison, the MGH and ATMR approaches yield better performance in maintaining the spectral information compared with the BSF and OCSW approaches. The fused result of the MGH method is too sharp in certain regions, such as building regions. The proposed ATMR offers excellent capability in both spectral and spatial aspects. The second and fourth row of Fig. 3 display error images (absolute values) of each method to further illustrate the fusion quality of each method. Fig. 3(a2) shows the standard image. ATMR causes the smallest differences between the fused and reference HSIs. Objective indexes for Moffett field data are computed and reported in Table 2. The MGH provides the best CC value, and followed by the ATMR. For the SAM, ERGAS, and RMSE indexes, the proposed ATMR obtains the best results, and demonstrates the excellent fusion performance.
The first and third row of Fig. 4 are given to show the subjective fusion images of each compared method for the Salinas data set. The spectral fusion property of the GS approach is unsatisfactory. Spectral distortion of the GS approach is serious, especially in soil and vegetable areas. Compared with the GS approach, the spectral performance of the fused HSI of the GFPCA algorithm is improved. But the GFPCA method has the insufficient spatial details in many regions. Although the OCSW and BSF algorithms seem have the excellent fusion property, the OCSW and BSF algorithms have a little spatial distortion in the edge and vegetable regions. The SEGM causes spectral distortion, especially in some edges. The visual analysis shows that the CNMF and GSA approaches have excellent spectral preserving property, but spatial structures and details in some vegetable regions are insufficient. The MGH and the ATMR perform well, but the spatial details of the MGH algorithm are too sharp. Same as the Moffett field data, the second and fourth row of Fig. 4 show the error images of each method for the Salinas data. A visual comparison shows that the ATMR has the minimum differences. The quantitative metrics are given in Table 3, and the ATMR outperforms the other competing methods in both spectral and spatial aspects.
The first and third row of Fig. 5 show the fused HSIs of different algorithms for Washington DC data. Fig. 5(a1) displays the reference HSI. Visually, the GS and CNMF methods effectively improve the spatial information, but these two methods introduce spectral distortion in some building and road regions. Same as the conclusions drawn in the Salinas and Moffett field data, some spatial structure information is missing in the fusion result obtained by the GFPCA method on Washington DC data. GSA approach exhibits the good spectral preservation effect. However, the spatial details produced by the GS approach are slightly deficient, such as in the building roof areas. The fused HSIs generated by MGH and OCSW have good spatial fidelity, but have slight spectral distortion. The SEGM, BSF, and ATMR methods produce the better vision effect than other competitive methods. To better demonstrate the advantage of the ATMR algorithm, the second and fourth row of Fig. 5 display the error images of  each algorithm. The error image from the ATMR method is the closet to the standard one as shown in Fig. 5(a2). This demonstrates that ATMR has superior performance in enhancing spatial information and preserving spectral information. Furthermore, the quantitative results are obtained and reported in Table 4, indicating that ATMR achieves the best fusion quality. The ATMR method obtains the smallest SAM, RMSE, and ERGAS values.    Fig. 6(a) and Fig. 6(b) show the real LSR HSI and HSR PANI, respectively. Fig. 6(c) shows the interpolated HSI. Fig. 6(d)-(l) presents the fused HSIs generated by each algorithm. The GS algorithm has significant spectral distortion, and the GFPCA algorithm exhibits obvious spatial distortion. By the comparisons of these fused results, CNMF, GSA, and SEGM perform good spectral fidelity, but they exhibit slight spatial blur. By careful observation, the MGH, OCSW, BSF, and the proposed ATMR approaches have good vision appearance in the spatial VOLUME 8, 2020  and spectral aspects. Table 5 presents the objective performance results of Hyperion data. It is evident that the ATMR performs the best. The values of SAM, RMSE, and ERGAS are the smallest, and the CC has the largest value.
The time complexity of different algorithms is compared, and the average computational times (s) of each method are shown in Table 6. As shown in Table 6, the GFPCA, GS, and GSA approaches are very fast. However, the pansharpened HSIs generated by these approaches are unsatisfactory.  MGH is also fast, but the spatial details of the fused HSI obtained by MGH may be too sharp. That is, they will be able to provide the simple fusion framework with low computational complexity by compromising fusion performance. The proposed ASTMR is slower than the CNMF, SEGM, and OCSW algorithms, and is faster than BSF. ASTMR is not very competitive in terms of the running time, but ASTMR has better fusion performance than other algorithms in both spatial and spectral aspects. We will develop parallel processing system to accelerate the running speed of the ASTMR method in the future research.

V. CONCLUSION
In this paper, a novel HSI and PANI fusion approach called ATMR is proposed. It presents an adaptive tensor method to obtain the spatial information of HSI, and introduces the Multi-scale Retinex algorithm to obtain the structure information of PANI. In addition, it employs a gradient based weighted fusion strategy to integrate the obtained spatial details of HSI and PANI. The proposed ASTMR algorithm is tested on four hyperspectral data sets which demonstrates it can effectively enhance spatial resolution and maintain spectral information of HSI.