Hyperspectral Superresolution Reconstruction via Decomposition of Low-Rank and Sparse Tensor

Hyperspectral superresolution reconstruction technique obtains a high-resolution hyperspectral image (HR-HSI) by fusing both a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image. Existing methods of hyperspectral superresolution reconstruction are mostly concentrated on the global low-rank property while spatial and spectral information in individual regions is not considered. To address this issue, the decomposition of low-rank and sparse tensor is proposed in this study. First, HR-HSI was decomposed into low-rank and sparse components. The former was further separated into spatial and spectral domains according to the spectral low-rank property, and the latter was used to compensate for information loss caused by the low-rank property. Then, a nonlocal constraint of adaptive manifold extracting structural details by the manifold structure was designed to enforce nonlocal self-similarity of the spatial domain. In order to ensure the same spatial structure of different bands and reduce the false individual regions in the sparse component, a surface-aware regularization combined with group sparsity was utilized. Finally, HR-HSI was constructed by the alternating direction method of multipliers. Experiment results on three datasets show that the proposed method outperforms five common existing methods by means of both visual and quantitative evaluations. It is concluded that the new method by taking into account the low-rank and sparse properties can improve the result of the reconstruction.


I. INTRODUCTION
H YPERSPECTRAL imaging is a new means of observations and it can capture multiple narrow-band spectral images from the same scene [1]. With the increase in the spectral bands, the energy captured by each spectral band decreases, resulting in a reduction in the spatial resolution of the images from these spectral bands. The low spatial resolution of hyperspectral images (HSIs) makes it difficult to meet the needs of applications, including the accuracy of classification [2], [3], detection of changes [4], [5], and detection of objects [6], [7]. In order to improve the spatial resolution of the HSIs, the hyperspectral superresolution reconstruction has recently been introduced. Low-resolution HSIs (LR-HSIs) and high-resolution multispectral images (HR-MSIs) have a low spatial but high spectral resolution and high spatial but low spectral resolution, respectively. High-resolution HSIs (HR-HSIs) with high spatial and high spectral resolutions can be obtained by fusing LR-HSIs and HR-MSIs [8] in the hyperspectral superresolution reconstruction. Pan-sharpening, as an early technique for a hyperspectral superresolution reconstruction, can be divided into four methods: 1) component substitution (CS), 2) multiresolution analysis (MRA), 3) Bayesian, and 4) variational methods. The CS method replaces hyperspectral spatial components from an LR-HSI with multispectral spatial components from an HR-MSI, and these components are obtained by specific algorithms such as intensity hue saturation (IHS) [9], [10], principal component analysis (PCA) [11], [12], [13], and Gram Schmidt (GS) spectral sharpening [14]. Different from the CS, the MRA saves all the contents of the LR-HSI and adds more information obtained from the HR-MSI through spatial frequency filtering. The MRA extracts the spatial details of the HR-MSI through wavelet transformation [15], [16], Laplacian pyramid [17], [18], and nonseparable transforms [19]. Then, these details are injected into the LR-HSI based on the pixel, window, and region of the LR-HSI. However, the CS and MRA methods only focus on spatial details, which often causes serious spectral distortion of the obtained HR-HSI. To maintain spectral information, Irmak and Wei regarded pan-sharpening as an appropriate method for reconstructing HR-HSIs and established a spectral image degradation model [20], [21]. Based on the spectral image degradation model, the Bayesian method [20], [21], [22] is under the assumption that the image noise follows a Gaussian distribution. The posterior distribution of the Bayesian model is used to simulate the hyperspectral superresolution reconstruction. Due to the incompleteness of spatial-spectral information of the observation images (the LR-HSI and HR-MSI), the reconstruction is regarded as an underdetermined inverse problem, and the potential HR-HSI requires a-priori constraints. The variational method resolves the illness of the hyperspectral superresolution reconstruction by adding total variational regularities [23], [24], [25] into the Bayesian method. Although the above methods can improve spatial resolution, the spectral correlation is significantly weakened, resulting in a spectral distortion of the reconstructed HR-HSI. The spectral unmixing, as a technique addressing spectral correlation, takes the spectra of mixed pixels as a linear combination of spectra from pure pixels [26]. Further research [27], [28], [29], [30] decomposed the HR-HSI into spectral basis and coefficients by using the spatial superresolution and spectral unmixing. Finally, the results of the spatial superresolution and spectral unmixing were obtained with interactive feedback. If only spectral correlation is used to decompose an unknown HR-HSI, it is difficult to obtain information of the spatial and spectral structure. To further explore this information, the factorization method has been investigated for reconstructing an HR-HSI. The unknown HR-HSI is regarded as a matrix, which can be factored into low-dimensional subspace and coefficients. The research [31], [32] used subspace-based regularization as the a-priori constraints to alternately update the low-dimensional subspace and coefficients. Inspired by the matrix factorization, Dian et al. [33] proposed a tensor factorization method. The unknown HR-HSI was regarded as a 3-dimensional tensor, which was decomposed into four factors using Tucker decomposition: a core tensor multiplied (n-mode products) by dictionaries (factor matrices) of three matrixes along each mode. The research [34], [35], [36] utilized coupled sparse, nonlinear, and low-rank tensors to obtain the above four factors, respectively. To utilize the self-similarity of the unknown HR-HSI effectively, a-priori low-rank constraint was used to reconstruct the hyperspectral superresolution [37], [38], [39]. The research [40] designed structured sparse lowrank representation to extract spatial and spectral representation coefficients, constraining the unknown HR-HSI.
As deep learning rises, research [41], [42] used a convolutional neural network to construct deep prior for the reconstruction of HR-HSIs. Dian et al. [43] proposed a deep HSI sharpening method, which directly learned the image priors via deep convolutional neural network-based residual learning. Well-trained CNN denoiser is added as a regularization to the reconstruction of HR-HSIs [44]. These methods achieved relatively significantly improved performance.
The aforementioned approaches attempt to characterize the spatial structure of an unknown HR-HSI by decomposition. However, since the scenes of HSIs are often complex, it is difficult to characterize an unknown HR-HSI by a single global attribute such as low-rankness. Specifically, there are many individual regions in HSIs, which occupy a very small area and are different from their surrounding spectral properties. However, the existing methods can hardly characterize these small regions well, which leads to a loss of regional spatial and spectral information in the hyperspectral superresolution reconstruction. Therefore, the key point of the hyperspectral superresolution reconstruction is to accurately characterize the spatial and spectral information of the HR-HSI in either global or individual special regions.
To characterize the above individual special regions, in this study, a new method named decomposition of low-rank and sparse tensor (DLST) is proposed, and its implementation procedure is as follows.
1) The unknown HR-HSI was divided into two components: 1) a low-rank and 2) sparse ones. The low-rank component for the global attribute of the HR-HSI was further separated into two parts: 1) the spectral and 2) spatial parts by the spectral low-rank property. The sparse component extracted spectral information in individual special regions. 2) A nonlocal constraint of adaptive manifold was designed for the global region, which used manifolds for grouping and adaptive weight determination, and deeply balanced the local low-rank property and nonlocal self-similarity of spatial part. 3) Surface-aware regularization combined with group sparsity (GS) was used to constrain the sparse component, which enhanced the same spatial structure of different bands and reduced the false individual regions. 4) An iterative optimization method was used to recursively update both the low-rank and sparse components to reconstruct the HR-HSI. The remainder of this article is organized as follows. Section II introduces the model of hyperspectral superresolution reconstruction and tensor train rank (TTR). The proposed approach is introduced in Section III. In Section IV, experimental results on three datasets are presented. Conclusions are given in Section V.

II. PRELIMINARIES AND NOTATIONS
In this section, the model of the hyperspectral superresolution reconstruction and TTR are introduced. The model of hyperspectral superresolution reconstruction building on an objective function transforms the reconstruction into an optimization problem by exploring the relationship among the HR-HSI, HR-MSI, and LR-HSI of the same scene. TTR is used to measure the low-rank property of the HR-HSI in the reconstruction of the HR-HSI.

A. Problem Model
An HR-HSI, the target of hyperspectral superresolution reconstruction, is expected to carry complete spatial and spectral information. Compared with the HR-HSI, the HR-MSI of the same scene lacks complete spectral information. The spectral information of the HR-MSI can be regarded as a linear combination of the complete spectral information of the HR-HSI to simulate the reduction of spectral resolution from the HR-HSI to the HR-MSI [45]. Let X ∈ R H×M ×N be the HR-HSI with M × N spatial resolution and H spectral bands. Let Y h ∈ R h×M ×N be the HR-MSI with M × N spatial resolution and h spectral bands. The relationship of X and Y h is often modeled as where X (3) ∈ R H×MN and [Y h(3) ∈ R h×MN are the threemode unfold of X and Y h , respectively; H ∈ R h×H is such a matrix whose columns contain the spectral response of the multispectral sensor (H > h).
Compared with the HR-HSI, the LR-HSI of the same scene has a lower spatial resolution. Every spectral band image of the LR-HSI can be regarded as spatial degradation from the corresponding spectral band image of the HR-HSI to simulate the reduction of spatial resolution from the HR-HSI to LR-HSI. Let Y H ∈ R H×m×n be the LR-HSI with m × n spatial resolution and H spectral bands; then, the relationship of (for the HR-HSI) and (for the LR-HSI) is often modeled as where Y H(3) ∈ R H×mn is the three-mode unfold of Y H ; M ∈ R MN×mn is the spatial degradation operator, which is often assumed to be composed of a cyclic convolution operator and a down-sampling matrix (M > m and N > n). As a result, X can be obtained by solving the following objective function: where . 2 F denotes the Frobenius norm, which is used to measure the similarity between matrix Y h(3) and matrix HX (3) .
Hyperspectral superresolution reconstruction is transformed into the optimization problem of equation (3), which is severely ill-posed and its solution may not be unique. To ensure a stable solution of X , various constraints can be added, and these constraints can be divided into implicit and explicit constraints. The implicit constraints are usually obscure, e.g., the matrix factorization [46] and Trucker factorization [37] of tensor X in the spectral direction are tight constraints on the factorization form of X by using the spectral low-rank property and tensor low-rank property of X , respectively. The explicit constraints are used to regularize the hyperspectral superresolution reconstruction. The regularizations include spectral unmixing [30], sparseness [32], nonlocal-spatial similarities [35], spatial smoothness, and lowrankness [46] of X . These constraints restrain the ill-posed solution of X , but they may also introduce errors to X . Hence, there is still a lack of more appropriate constraints for solving X .

B. Tensor Train Rank
For an N-dimensional tensor A ∈ R I 1 ×...×I N , its tensor train (TT) decomposition [47] is composed of N 3-dimensional core factors G 1 , G 2 , ..., G N where G n ∈ R r n ×I n ×r n+1 for n = 1, 2, ..., N with r 1 = r N +1 = 0. Each element in A is calculated as where • is the outer product; [R 1 , . . . , R N ] is the TT rank of A. However, the TT decomposition for a tensor is not unique, i.e., different TT decompositions often lead to different TT ranks. In practice, the main focus is often on the lower bound of the TT rank R n ≥ rank(A (n) ), where A (n) is the n-mode unfold of A. Therefore, the TT rank of A can be defined as where rank(A (n) ) equates to the number of the positive singular values of A (n) , and it is nonconvex and thus difficult to optimize.
To solve this problem, the convex nuclear norm is usually introduced to replace rank(A (n) ), and rank(A (n) ) is equal to the sum of the singular values of A (n) ; thus where σ i (A (n) ) is the ith singular value of A (n) . The log sum of the singular values is used to shrink larger singular values slightly and shrink smaller singular values much [48]. The TTR is defined as III. METHOD The architecture for the DLST method proposed in this study for the hyperspectral superresolution reconstruction is shown in Fig. 1. The DLST was designed to decompose the HR-HSI, see Fig. 1(a). To maintain the spatial information of HR-MSI, a nonlocal constraint of adaptive manifold via TTR was used to effectively maintain the local low-rank property and nonlocal self-similarity of the HR-HSI, see Fig. 1(b). To reduce the solution space of the sparse component, surface-aware regularization combined with GS was introduced, see Fig. 1(c). Finally, the reconstruction was transformed into an alternative optimization problem of multiple variables, see Fig. 1

A. Decomposition of Low-Rank and Sparse Tensor
Recently some new models of tensor decomposition are proposed for HSI denoising [49], fusion [34], [39], and tensor completion task [50], [51], [52]. The global correlation across spectrum and nonlocal self-similarity are utilized by nonlocal low-rank regularized tensor decomposition [49]. Li et al. [34] and Dian et al. [39] utilized coupled sparse tensor factorization (CSTF) and low TTR representation to characterize HSIs, respectively. Xue et al. [50] encoded the structured sparsity of a tensor by the multiple-layer representation. Xue et al. [51] proposed a parametric tensor sparsity, which encoded the sparsity for a general tensor by Laplacian scale mixture modeling. Xue et al. [52] presented an enhanced sparsity prior model using both local and global sparsity information in HSIs. These models describe an HR-HSI using combinations of multiple attributes, and it is difficult to characterize HR-HSIs by a single attribute. Hence, DLST considering low-rank and sparse properties is constructed.
HR-HSIs are spectral low-rank, and an HR-HSI can be divided into pure pixel images, each of which has corresponding spatial coefficients [53]. Assuming that the number of the pure pixels in the HR-HSI is K, then where W k ∈ R M ×N and h k ∈ R 1×H are the spatial coefficients and pure pixel matrices, respectively; h k is an expression of the spectrum of the kth pure pixel (k = 1, 2, ..., K).
Theoretically, an HR-HSI can be expressed by (8). However, K (K M and K N), as spectral rank, is set to a small value to constrain the HR-HSI. This causes the pure pixels in the individual regions covering small areas to be ignored as nonmain components. These pure pixels in the individual regions are just in accordance with sparsity because they cover very small areas and may even contain only a few pixels in the HR-HSI. Therefore, the sparse component is used to represent these individual regions and supplement the loss of information caused by spectral low-rank constraints. Euclidean distance (ED), as a measure of the similarity between X (the expected HR-HSI) and (8)], was obtained by setting different values for K shown in Fig. 2(b). As we can see from Fig. 2(a) that in the red box, the cars park sparsely beside the green belt, and the spatial and spectral features of the cars can hardly be expressed when K is set to a small value as shown in Fig. 2(b); thus, the individual regions need to be emphasized by the sparse component. An HR-HSI can be divided into two components: 1) a low-rank componentX and 2) a sparse component S, and thus, it can be written as whereX is composed of K pure pixels, and it can be expressed by pure pixel matrices and their corresponding spatial coefficients: The HR-HSI X can be written as Equation (10) can be rewritten as the following unfolded form: whereW k = vec(W k ) T . According to Section II-A, combined with (1) and (2), HR-MSI and LR-HSI can be rewritten as Hh k TW k + HS (3) .
Equation (3) can be rewritten as The reconstruction problem expressed by (13) is now transformed into solving W k , h k , S. Some constraints are needed to narrow down the solution space for W k and S in the ill-posed problem. h k can be implicitly constrained by accurately estimating of W k and S. In this study, a nonlocal constraint of adaptive manifold and the surface-aware regularization combined with GS was adopted to constraint W k and S, respectively.

B. Nonlocal Constraint of Adaptive Manifold via TTR
Nonlocal self-similarity and local low-rank property over space are two important characteristics for HR-HSIs [45]. In order to enhance nonlocal self-similarity, the spatial coefficient W k is divided into many nonlocal patches and the most similar nonlocal patches are searched in the whole image. In this process, the similarity measurement of the patches based on image intensity is rough. Patches in Fig. 3(c) is divided into two groups by the image intensity [see Fig. 3(a)]. Textures in patches are ignored. In order to solve this problem, the manifold is introduced as the basis of the similarity measurement of the patches. Patches in Fig. 3(d) are divided into four groups by the manifold of the image [see Fig. 3(b)]. We can see that the manifold structure emphasizes texture details better than intensity. Therefore, the grouping of the patches based on manifold structures is better than that based on the image intensity because the former focuses more on texture.
The spatial coefficients W k are divided into patches, and those patches that have similar manifold structures in the HR-HSI are grouped together, then the patches in the same group constitute a 3-dimensional tensor W k g . All patches in Fig. 1 are divided into G groups. The TTR of the 3-dimensional tensor W k g is used to make a unified tradeoff between the local lowrank property and the nonlocal self-similarity of the manifold structure for W k , which is written as Additionally, as the HR-HSI is unknown, its patches cannot be directly grouped. To solve this problem, the grouping of the patches that are in the HR-MSI was conducted since the spatial information is mainly contained in the HR-MSI. In the TTR, Rank(W k g (1) ) and Rank(W k g (2) ) reflect the local low-rank property of W k , and Rank(W k g (3) ) reflects the nonlocal selfsimilarity of the manifold structure for W k . Their weights need to be determined. The weight of Rank(W k g (3) ) is expected to be larger for the group with notable manifold structure. The weights of Rank(W k g (1) ) and Rank(W k g (2) ) are expected to be larger for the group with insignificant manifold structure. In this study, the weight was determined by the sparse norm of the manifold vector gradient since the sparse norm can measure the manifold strength. Thus, the nonlocal constraint of the adaptive manifold via TTR can be formulated as For the manifold of an image, more attention is paid to the texture details rather than the intensity of the image, and also more attention is paid to the nontexture region rather than the gradient of the image. The use of the adaptive weights based on the image's manifold further magnifies this advantage. Therefore, the nonlocal constraint of the adaptive manifold via TTR can be used as an effective constraint on the spatial information of HR-HSIs in both simple and complex environments.

C. Surface-Aware Regularization Combined With GS
The sparse attribute of the sparse components S is mainly reflected in the spatial domain. Standard sparsity is used to minimize the nonzero elements of S when each element in S is independent. Obviously, this ignores the fact that different bands of S have the same spatial structure. In order to further preserves the potential relationship of different bands, the GS is used to ensure nonzero values in the same spatial locations of different bands. The GS not only ensures that different bands of S are sparse but also preserve the potential relationship of different bands. The group sparse regularity of S in the spatial domain is expressed by the L1,2-norm The GS ensures that the sparsity of sparse components preserves individual regions, but it may also obtain false individual regions. In order to eliminate the false individual regions, surface-aware regularization is increased to reduce the false individual regions and disadvantageous structures and retain individual regions.
The sparse components S can be divided into H forward slices: S = {S (1) , . . . , S (h) , . . . , S (H) }. In geometric frameworks for image processing, intensity images are considered as surfaces in the spatial-feature space. A image is thereby a 2-dimensional surface in the 3-dimensional space [54], and the surface of where and |∇ j S (h) | 2 represent the difference matrices in the horizontal and vertical directions, respectively, and they are calculated via Surface-aware regularization of S is the sum of the surfaceaware regularization of all forward slices, which can be written as The surface-aware regularization combined with GS can be defined as where λ and γ are the weights of the surface-aware regularization and GS, respectively.

D. Alternating Optimization
Obviously, it is an inverse problem to estimate the HR-HSI from HR-MSI and LR-HSI. we incorporated the nonlocal constraint of adaptive manifold via TTR and surface-aware regularization combined with GS into (15) to transform the problem well-posed into Although the variables in (21) are coupled, they can be solved alternatively, i.e., one variable is solved by fixing the others in the corresponding minimization problem. Problem (21) can be broken into three subproblems, in which all variables are iteratively updated with the others fixed. The proximal alternating optimization (PAO) scheme [55], [56] is used to solve (21), which can be guaranteed to converge to a critical point under certain conditions and can be written as The details for the optimizations of W k , h k , and S are as follows.
1) Optimization of W k : With fixed h k and S, the optimization of W k in (22) can be written as (3) and (25) is an unconstrained optimization problem and is solved by the ADMM [57]. By introducing the variable U k = W k , the following augmented Lagrangian function is acquired: The augmented Lagrangian function (26) can be minimized by iteratively solving the following subproblems. a) Update W k : With U k fixed, the subproblem to update W k in (26) can be written as Problem (27) is quadratic and its unique solution amounts to compute the general Sylvester matrix equation:  (28) can be written as where (29) can be rewritten as where ζ i = ∇W k g(i) 2 , which is obtained using the last updated W k to simplify the process of calculation. Problem (30) is a low-rank recovery problem, thus it has the following solution: where U i , Σ i , and V i are acquired by the SVD of W k 2ρ . Σ i is the result of shrinking the larger singular values slightly and shrinking the smaller singular values much from Σ i [39].
The obtained matrices U k g (1) , U k g (2) , and U k g (3) are restored to tensors, and their mean values are used to obtain the final U k . Algorithm 1 Process for the optimization of W k for solving problem (22).
2) Optimization of h k : With W k and S fixed, the optimization of h k in (23) can be written as Problem (32) is quadratic and its unique solution amounts to compute the general Sylvester matrix equation: Finally, conjugate gradient (CG) [54] is used to solve (33). The symmetric positive definite system matrix, simple calculation steps, and fast convergence prompted the CG to be selected.

3) Optimization of S:
With W k and h k fixed, the optimization of S in (24) can be written as To solve (34) efficiently, we introduce two auxiliary tensor [V and L to substitute the S in the terms Φ(S) and ϕ(S), respectively. Then, the minimization problem (34) can be reformulated as the following constrained optimization problem: The problem (35) can be minimized by iteratively solving the following subproblems. a) Update S: With [V and L fixed, the subproblem to update S in (35) can be written as Problem (36) is quadratic and its unique solution amounts to compute the general Sylvester matrix equation: Finally, CG is used to solve (37) due to its symmetric positive definite system matrix, simple calculation steps and fast convergence. b) Update L: With V and S fixed, the subproblem to update L in (35) can be written as The subproblem can be solved by half-quadratic optimization strategy [58] L   (12) The subproblem of update V can be reformulated to solve the forward slice Due to its simplicity and stability, we apply the primal-dual algorithm [59], [60]. ∇V (h) is the matrix of ∇ i,j V (h) . Then, the subproblem (41) can be easily rewritten in terms of a saddle point problem The following iteration equations are used to solve the primaldual problem (41): (43) Algorithm 2 Process for the optimization of the proposed DLST method.

IV. EXPERIMENT RESULTS AND ANALYSIS
A. Experiment 1) Datasets: Three datasets were used to test the performance of the proposed method. The first dataset (Pavia University) was acquired from the reflective optics system imaging spectrometer ROSIS sensor during a flight campaign at Pavia University, Northern Italy. The image is of size 610 × 340 × 115 with a 1.3-m spatial resolution and the spectral coverage ranging from 0.43 μm to 0.86 μm. The second dataset (CAVE) was acquired by a Cooled CCD camera (Apogee Alta U260) from 32 scenes. The images are of size 512 × 512 × 31 with the spectral coverage ranging from 0.4 μm to 0.7 μm with a 0.01-nm interval. The third dataset (Chikusei) was taken by a Headwall Hyperspec-VNIR-C imaging sensor over the agricultural and urban areas in Chikusei, Ibaraki, Japan. The image is of size 2517 × 2335 × 128 with a 2.5-m spatial resolution and the spectral coverage ranging from 0. 363 μm to 1.018 μm. We generate the LR-HSI via down-sampling the reference by averaging over disjoint s × s blocks, where s is the scaling factor, and simulate HR-MSI via down-sampling the reference along the spectral dimension using the spectral response matrix H derived from a Nikon D700 camera.
2) Datasets Quantitative Metrics: To evaluate the performance of the hyperspectral superresolution reconstruction from all methods, in this study, the following six image quality indices were adopted: 1) peak signal-to-noise ratio (PSNR), 2) universal image quality index (UIQI), 3) root-mean-squared error (RMSE) [61], 4) structured similarity (SSIM) [62], 5) spectral angle mapper (SAM), and 6) erreur relative globale adimensionnelle de synthèse (ERGAS) [63]. The PSNR, RMSE, and UIQI were for evaluating the similarity between the estimated HR-HSI and the reference. The SSIM, SAM, and ERGAS measure the structural consistency and the spectral distortion between the reconstructed HR-HSI and the reference. The larger the PSNR, SSIM, and UIQI values, the better the reconstructed results, and the smaller the RMSE, SAM, and ERGAS values, the less the error in the reconstruction and less the distortion in the spectral.

B. Selection of Parameters
For the DLST method, the two key parameters K and G need to be set. K is the number of pure pixels and G is the number of groups. The former affects the decomposition process of the low-rank component and the latter affects the grouping process of the manifold structure. To set appropriate values for these two parameters, the Stuffed Toy (a close-range HSI in the CAVE) and Pavia University of remote sensing aerial perspective were selected for reconstruction. The aforementioned PSNR was used to evaluate the performance of reconstructed images. The PSNR curves of the reconstructed image were obtained by setting different values for the above two parameters, as shown in Fig. 4. Fig. 4(a) shows the PSNR curves of the reconstructed Stuffed Toy and of Pavia University as a function of K. When K changes from 1 to 6, the PSNR of CAVE increases, but when K is greater than 8, the PSNR starts to decrease. Therefore, K = 7 was adopted for CAVE. The PSNR of Pavia University rises rapidly with the increase of K from 1 to 11, and then, it changes little. As a result, K of Pavia University was set to 11. For those scenes with strong spectral similarity, such as small scenes captured by CAVE, which capture much smaller scenes and contain much fewer objects, a smaller value was set to K. Due to strong spectral similarity in small scenes from the CAVE, smaller scenes are captured, fewer objects are contained and a smaller value is set to K. Due to complex spectra in the scene from the Pavia University, which is remote sensed and contains many different objects, a larger value is set to K. Fig. 4(b) shows the PSNR curves of the reconstructed Stuffed Toy and of Pavia University as a function of G. The PSNR of CAVE rises with the increase of G from 20 to 160, and then, it changes little. The PSNR of Pavia University rises with the rise of G in the range from 20 to 400, and then, it decreases with the increase of G from 400 to 450. As a result, 160 and 400 were set for G of CAVE and Pavia University, respectively. For those scenes with simple texture, e.g., CAVE, which has the monotonous texture because of few indoor targets, a small value is set to G. While for the other scenes, i.e., complex texture, e.g., Pavia University, which has the chaotic texture because of the different ground objects, a large value is set for G.

C. Convergence Analysis
In order to analyze the convergence of the method, DLST was used to reconstruct HR-HSIs of the Pavia University and the Stuffed Toy. The RMSE curves of the reconstructed HR-HSIs were obtained by setting the different number of iterations, as shown in Fig. 5. Fig. 5(a) and (b) shows the RMSE curve of the reconstructed Pavia University and Stuffed Toy as a function of the number of iterations, respectively. The RMSE of Pavia University and Stuffed Toy decreases with the rise of the number of iterations in the range from 1 to 32, and then, it converges gradually at 1.5971 and 1.557, respectively. It shows that This method has good convergence in the reconstruction of HR-HSI. The number of iterations from DLST was set to 32.

D. Ablation Test
In order to verify the effectiveness of DLST, three ablation tests about DLST, nonlocal constraint of adaptive manifold via  TTR and surface-aware regularization combined with GS were carried out.

1) Decomposition Test:
In order to verify the promotion of the DLST to hyperspectral superresolution reconstruction, DLST without sparse component (SC) and DLST were used to reconstruct HR-HSIs of the Pavia University and the Stuffed Toy. K, as an important parameter of decomposition, is set to different values to control decomposition. The PSNR curves of the reconstructed image were obtained by setting different values for K on the two methods, as shown in Fig. 6. Fig. 6(a) and (b) shows the PSNR curves of the reconstructed Pavia University and Stuffed Toy as a function of K, respectively. The PSNRs of Pavia University by DLST without SC and DLST rise with the rise of K in the range from 1 to 15 and from 1 to 11, and then, they keep at 42.91 and 44.9, respectively. The PSNRs of CAVE by DLST without SC and DLST rise with the rise of K in the range from 1 to 7 and from 1 to 12, and then, they keep at 42.1 and 44.45, respectively. It shows that DLST can reconstruct HSIs more effectively than DLST without SC. The sparse component in the DLST can compensate for the information loss brought by the constraint of low rankness for the reconstruction of the HR-HSI.
2) Nonlocal Constraint of Adaptive Manifold Test: In order to verify the promotion of nonlocal constraint of adaptive manifold via TTR to hyperspectral superresolution reconstruction, DLST without nonlocal self-similarity (NS), which replaces nonlocal self-similarity via TTR with matrix low-rankness via nuclear norm, DLST without adaptive manifold (AM), which replaces the manifold with the image intensity, and DLST were used to reconstruct HR-HSIs of the Pavia University and Stuffed Toy. Table I shows the means of the PSNR and RMSE values of the HR-HSIs constructed by the three methods for the Pavia University and the Stuffed Toy. Compared with DLST without NS and DLST without AM, the PSNR and RMSE values of HR-HSIs reconstructed by DLST are the highest and lowest, respectively. Hence, the combination of adaptive manifold and nonlocal self-similarity can constrain spatial information more effectively.
3) Surface-Aware Regularization Combined With GS Test: In order to verify the promotion of surface-aware regularization combined with GS to hyperspectral superresolution reconstruction, DLST without GS, DLST without surface-aware regularization (SA), and DLST were used to reconstruct HR-HSIs of the Pavia University and the Stuffed Toy. The PSNR curves of the reconstructed image were obtained by setting different values for K on the three methods, as shown in Fig. 7. Fig. 7(a) and (b) shows the PSNR curves of the reconstructed Pavia University and Stuffed Toy as a function of K, respectively. The PSNRs of Pavia University by the three methods rise with the rise of K in the range from 1 to 11, and then, they keep at 41.71, 42.92, and 44.95, respectively. The PSNRs of CAVE by the three methods rise with the rise of K in the range from 1 to 7, and then, they keep at 41.32, 43.21, and 44.55, respectively. It shows that DLST can reconstruct HSIs more effectively than DLST without GS and DLST without surface-aware regularization. The combination of surface-aware regularization and GS can more accurately constrain the sparse component of HR-HSI in the reconstruction.
1) Spatial Reconstruction: The performance in terms of spatial reconstruction was evaluated using the Pavia University and Stuffed Toy, and this selection was due to the completely different scenes of observations. The Stuffed Toy and Pavia University, as two HSIs, are from indoor close and outdoor remote sensing scenes, respectively. The spatial reconstructions from all the six methods were compared in both visual and quantitative aspects. For visual comparison, the reconstructed images and corresponding error images of the above two HSIs at the two bands that had the largest and smallest spectral reflections are shown in Figs. 8 and 9, respectively. For quantitative comparison, the mean of the PSNR values of all bands in the reconstructed HR-HSIs of the two HSIs is shown in Fig. 10.
In Fig. 8, the reconstructed images/corresponding error images are shown in the first/second and third/fourth rows for Pavia University on the 30th and 80th bands, respectively. The last column (g) is the ground truth. In each subfigure, the large red box is a zoom-in of the small red box for clearly observing the small region. The two special areas in the small red boxes of the upper two rows and the bottom two rows, respectively, are the cars parking sparsely at the edge of the green belt on the 30th band and the duplicate T-shaped building on the 80th band, respectively. Due to the small area of the cars and the large difference between the cars and their surrounding regions in the spatial domain, their spatial structure is difficult to be reconstructed. The duplicate T-shaped building can be used as an investigation area due to its clear spatial low-rank structure. From subregion (see the large red box) comparison results of the error images, the HR-HSI constructed by DLST is closer to the ground truth in both the cars and duplicate T-shaped building regions than all the other methods. Therefore, the DLST method retains more spatial information of the cars and duplicate T-shaped building than the other five methods.
In Fig. 9, the reconstructed images/corresponding error images are shown in the first/second and third/fourth rows for the Stuffed Toy on the 5th and 26th bands, respectively. The small and large red boxes in each subfigure are similar to that in Fig. 5. The two special areas in the small red boxes of the upper two rows and the bottom two rows, respectively, are the cluttered background on the 5th band and the striped sweater on the 26th band, respectively. The cluttered background has complex and irregular texture while the striped sweater has a simple low-rank texture. From (see large red box) comparisons of the error images from different methods, the HR-HSI recovered by DLST is closer to the ground truth in both complex and simple structured texture regions than all the other methods. This result suggests that the DLST method reserves more detailed characteristics than the other five methods.
For quantitative comparison, Fig. 10 shows the mean of the PSNR values of all bands in the reconstructed HR-HSIs resulting from each method. Fig. 10(a) and (b) shows the mean of the PSNR values of the reconstructed HR-HSIs for the Pavia University and Stuffed Toy, respectively. The results in both subfigures indicate that the proposed DLST method outperforms the other methods on all bands.
2) Spectral Reconstruction: The spatial reconstructions of the six methods were compared in both visual and quantitative aspects. Fig. 11(a) and (b) shows the spectral reflection curves of the two regions (cars and cluttered background) in the reconstructed HR-HSIs from the Pavia University and Stuffed Toy, respectively. Again, the large red box is a zoom-in of the small one. Both subfigures indicate that the spectral reflection curve of DLST is closer to the truth than all the other methods (see the large red boxes).  The ED and cosine similarity (CS) were used to measure the similarity between the spectral reflection curves constructed by all six methods and the true spectral reflection curve, and the results are listed in Table II. The smaller the ED and CS values, the more similar the spectral reflection curves are. Among all the six results, the DLST result is the smallest, meaning that DLST is the best performer. Table III shows the means of the PSNR, RMSE, UIQI, SSIM, SAM, and ERGAS values of the HR HSIs constructed by all the six methods for the Pavia University, Chikusei, and the two HSIs (Flowers and Lemons in the CAVE). One can see that DLST is the best performer in terms of all the six PQIs. The performance of the NLSTF method is the closest to that of DLST for scenes with high-texture   information, e.g., Pavia University and Chikusei, and lower than that of DLST for scenes with low-texture information, e.g., the lemons and flowers. The mean of all the six PQIs from DLST is close to that of NLSTF for the Pavia University. The mean of the PSNR, SSIM, and UIQI values of DLST is larger than that of CSTF by 4, 0.01, and 0.004 for the Flowers, respectively. The mean of the RMSE, ERGAS, and SAM values of DLST is lower than that of NLSTF by 0.5, 0.5, and 2 for the Flowers, respectively. The results of the LTMR and LLTR methods are the closest to that of DLST for scenes with low-texture information, e.g., the Lemons and Flowers, and lower than that of DLST for scenes with high-texture information, e.g., the Pavia University and Chikusei. The mean of all the six PQIs of the DLST method is close to that of LTMR for the lemons. The mean of the PSNR, SSIM, and UIQI values of DLST is larger than that of LTMR by 10, 0.05, and 0.04 for the Chikusei, respectively. The mean of the RMSE, ERGAS and SAM values of DLST is lower than that of LTMR by 4, 0.6, and 2 for the Chikusei, respectively. The results of all the methods in Table III were further analyzed for their own characteristics. The performance of Hysure is poor for scenes with low texture information because the TV regularization hardly characterizes these scenes, e.g., the lemons. The CSTF method uses coupled sparse tensor factorization to exploit spatial-spectral correlations, leading to the better results than Hysure. The LTTR and LTMR methods explore the low-rank property based on the nonlocal spatial self-similarity, hardly maintaining texture details for scenes rich in complex texture, e.g., the Pavia University and Chikusei. The NLSTF method based on the NLSTF, lacking the a-priori constraints in homogeneous regions. Therefore, NLSTF leads to more errors in homogeneous regions than LTTR and LTMR for the lemons. In contrast, the DLST method has the highest accuracy in both spatial and spectral preservation for the three datasets.

4) Runtime Analysis:
To show the computation efficiency of the six methods, Table IV reports the average computational time in seconds of the compared methods on the Pavia University, CAVE, and Chikusei. All experiments are implemented at Matlab R2021b, and the computer has an Intel Core-i5-11400F CPU with 2.6-GHz and 16-GB random access memory. As can be seen from the table, the Hysure method needs the least computational time on the three datasets. Our method needs relatively more computation time on the three datasets, and the main computation burden of our method comes from solving (25), where we compute TTR for each cluster. The time complexity for solving (29) is mainly related to the size of patches, the number of patches in each group, and the number of spectral ranks. 5) Scaling Analysis: As the ratio of the HR-MSI to the LR-HSI in the spatial domain, the scale factor is an important factor affecting reconstruction. With the increase of the scale factor, the spectral and the spatial information contained in the LR-HSI decreases under the same HR-MSI. To compare the influence of the scale factor on the methods, the values of 2 1 , 2 2 , 2 3 , 2 4 , 2 5 , and 2 6 were set for the scale factor to obtain the LR-HSIs. Then, the LR-HSIs and the same HR-MSI were used to construct the HR-HSIs using all the six methods. Fig. 12 shows the PSNR and RMSE results for evaluating the quality of the HR-HSIs. From Fig. 12(a) and (b), with the increase of the scale factor, the PSNR and RMSE values of the HR-HSIs constructed by the DLST method decrease and increase more slowly, respectively, than that of all the other methods. This implies that the proposed DLST method can better resist information reduction caused by the increase of the scale factor.

V. CONCLUSION
In this study, to explore spatial and spectral information in individual regions of interest for hyperspectral superresolution reconstruction, a new method named DLST is proposed to divide an unknown HR-HSI into two components: 1) low-rank and 2) sparse components, focusing on the spectral low-rank property in the global region and the sparsity in the individual special regions, respectively. The low-rank component was constrained by the nonlocal constraint of adaptive manifold via TTR for balancing the local low-rank property and nonlocal selfsimilarity of manifold structure. Surface-aware regularization combined with GS was used to constrain the sparse component in the reconstruction. Experimental results using three datasets show that the new method outperforms the other five methods in spatial and spectral reconstruction. The DLST method can extract spatial information from HR-MSIs and restore spectral information in LR-HSIs.