Optimization-Based Hyperspectral Spatiotemporal Super-Resolution

Due to hardware limitations and financial considerations, it is challenging to acquire fine spatial and temporal resolution (FSFT) images, which leads to the interest in spatiotemporal fusion problems. Instead of directly obtaining costly FSFT images, an alternative is to fuse fine spatial, coarse temporal resolution (FSCT) images with coarse spatial, fine temporal resolution (CSFT) images. Unlike existing spatiotemporal methods which are only designed for multispectral images, this paper first proposes a new fusion framework for hyperspectral spatiotemporal super-resolution, termed HSTSR. In this paper, we first deal with the coarse temporal resolution issue by adopting the fast iterative shrinkage-thresholding algorithm (FISTA) to estimate the missing images at the intermediate time series. Then, we fuse the hyperspectral image and the multispectral image in each time series via coupled nonnegative matrix factorization (CNMF) to get FSFT hyperspectral images. Importantly, we can automatically estimate the associated spatial blurring and spectral downsampling matrices without prior satellite hardware information. Compared with other extended multispectral spatiotemporal methods, our method not only attains satisfying qualities significantly faster, but also requires much less input data.


I. INTRODUCTION
Spectral information in a satellite remote sensing image is a critical resource for many applications, such as surface change detection and precision agriculture [1], [2]. However, due to hardware limitations and cost considerations, satellite sensors cannot obtain images with both fine spatial and fine spectral resolutions, that is, there exists a trade-off relation between spatial and spectral resolutions. Therefore, the resolution enhancement of an image, also known as image super-resolution [3], [4], is a prominent field of study which enhances the resolutions of an image in different dimensions to overcome the trade-off dilemma. These dimensions include the spatial, spectral, and temporal characteristics of a remote sensing image. To enhance multiple image resolutions, techniques such as The associate editor coordinating the review of this manuscript and approving it for publication was Shaoyong Zheng .
super-resolution mapping [5] utilizes the inter-dimensional spatial and spectral correlations of the original coarse image to encounter the influence of linear and nonlinear imaging conditions. By considering two-dimensional spatial/spectral super-resolution, methods usually combine a coarse spatial, fine spectral resolution image with a fine spatial, coarse spectral resolution image into a fine spatial, fine spectral resolution image. Pansharpening technologies such as [6] aims to fuse multispectral images with fine spatial resolution panchromatic images (single band), while other methods such as [7] fuses multispectral images with hyperspectral images (many bands).
On the other hand, considering the two-dimensional spatial/temporal case, known as spatiotemporal super-resolution (STSR), methods enhance the time and spatial resolutions of a specified area. This is important because of the need for continuous monitoring of human activities [8], agricultural yields [9], and forest reflectance [10]. To be specific on the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ trade-off between temporal and spatial resolution, it can be expensive for satellites to have a short revisiting cycle (i.e., fine temporal resolution) and obtain fine spatial resolution images each time it passes through the specified area. Furthermore, even if a satellite sensor is provided with such qualities, the acquired images may suffer from serious atmospheric effects, such as the cloud masking phenomena [11], leading to situations where useful imagery are irregularly missing, thereby degrading the actual temporal resolution of the sensor.
Regarding the STSR problem, to acquire fine spatial and temporal resolution (FSFT) images, there are many methods that are inspirational. Some utilize and extend the merits included in the three classical methods: 1) STARFM [12]: Each pixel is predicted based on the similar neighboring pixels within a local window, then the similar neighboring pixels are weighted by three measurements, which are the spectral difference, temporal difference, and location distance. Zhu et al. [13] further proposed an enhanced version ESTARFM to improve the performance on heterogeneous landscape. 2) Fit-FC [14]: In the first stage, regression model fitting (RM) constructs a linear model associated with a local window and uses the least squares method to estimate the coefficients. Next, the spatial filtering (SF) technique was applied to remove the blocky artifacts due to RM. Finally, the residual compensation stage upsamples the coarse residuals via bicubic interpolation and uses the spectrally similar neighboring pixels to update the residuals. 3) FSDAF [15]: It is composed of six steps and considers the reflectance change caused by both endmembers and class fraction changes. The Thin Plate Spline (TPS) interpolator and information associated with residuals and neighboring pixels were both utilized to predict the fine-resolution image. There are many variations and developments of FSDAF. Liu et al. [16] designed IFSDAF to perform super-resolution on the normalized difference vegetation index (NDVI) images. SFSDAF [17] is developed based on incorporating sub-pixels to obtain a more precise result.
There are also methods that cannot be straightforwardly categorized as the extensions of STARFM, Fit-FC, and FSDAF. In the recently published Reliable and Adaptive Spatiotemporal Data Fusion (RASDF) method [18], a novel reliability index is introduced to measure the spatial reliability distribution of the input data. It participates in the calculations of the residual model and weight of reliability to identify the optimal preliminary and final fusion results, resp. Global and local unmixing models were applied to obtain strong spectral information changes while preserving structural information. Object-based spatiotemporal fusion model (OBSTFM) [19] is another recent spatiotemporal fusion technique. It first applies multi-resolution segmentation to separate the various spectral and surface features into different objects, then a linear injection model estimates the fine image via an injection gain. Finally, a SF-based technique (with object areas replacing local windows) is applied to calculate the final estimation weights.
Alternatively, learning-based frameworks [20], [21] also provide inspirational aspects towards spatiotemporal fusion, and the field of study is rapidly expanding in recent years. The introduction of convolutional neural networks (CNN) and generative adversarial networks (GAN) into STSR further resulted in their integration with conventional methods, producing hybrid approaches such as [22]. Another hybrid approach [23] blends STARFM with very deep superresolution (VDSR) and evaluated the overall performance via different combinations of processes.
It should also be mentioned that a recent optimization-based method STSRS2P [24] solves the STSR problem by incorporating the image self-similarity prior. It adopts the plug-andplay skill, which is a popular strategy in machine learning, and utilizes the powerful denoiser BM3D [25] as a regularizer to evaluate the self-similarity of an image. Unlike existing learning-based methods that only process data band-wisely and lack investigation into the degrading relation between fine and coarse images [26], Peng et al. developed an optimization-based SCDNTSR method that fully utilizes the available spatial, spectral, and temporal information in satellite images. By transforming the problem into the Fourier domain, SCDNTSR is able to achieve a low complexity STSR.
Some three dimensional fusion methods have been developed recently. Based on matrix factorization, Sheng et al. [27] proposed an integrated framework that utilizes the conjugate gradient algorithm to perform multi-dimensional image fusion. However, its inefficiency and homogeneous image constraint led to improved frameworks. Instead of adopting the matrix factorization scheme, Peng et al. [28] proposed an integrated spatio-temporal-spectral fusion framework based on the Tucker decomposition model. Image sets were viewed as four-dimensional tensors, and dictionaries associated with each dimension were updated by the Alternating Direction Method of Multipliers (ADMM) [29] to reconstruct the core tensor.
However, despite the various classical-, learning-, hybrid-, optimization-, and integration-based methods, most of them still focus STSR on multispectral data. Compared to multispectral images (MSIs), hyperspectral images (HSIs) have up to hundreds of spectral bands, making them possess richer information and thus contributing to wider usages. Based on literature survey, fusion methods seldom consider three dimensional super-resolution of remote sensing images directly. In addition, while optimization-based fusion methods are extensively studied in the spatial/spectral resolution enhancement framework, few exist in the STSR case. Moreover, inputs of spatial/spectral optimization-based fusion algorithms usually include the spatial and spectral downsampling matrices, known as priori [30], [31]. This can restrict the fusion inputs to only a few satellites whose sensor relations are readily known, which narrows the range of practical usage with these fusion methods. Therefore, in this paper, an optimization-based, three dimensional (spatial, spectral, and temporal) super-resolution method with auto downsampling matrix estimation, termed HSTSR, is proposed to possibly provide means for multiple dimensional resolution enhancement, as well as introduce an additional optimization framework into the current STSR literature regarding a wider range of satellite options.
First, we estimate the spatial blurring and spectral downsampling matrices [7] via a novel gradient descent-ADMM cooperative algorithm. It enables automatic estimation of the matrices without prior satellite hardware information, which can potentially lead to the fusion between arbitrary hyperspectral/multispectral satellite sensors. Second, recall the sensor's temporal resolution degrading phenomenon due to serious atmospheric effects. HSTSR alleviates the problem by directly estimating the missing images via the fast iterative shrinkage-thresholding algorithm (FISTA) [32], which is known as an effective proximal gradient-based algorithm with an O( 1 k 2 ) convergence rate. After obtaining the complete pairs of images in the entire time series, each pair is treated as an inverse problem with respect to (w.r.t.) the corresponding FSFT image [30]. Subsequently, based on the inverse problem, we formulate a minimization criterion, which is non-convex in general, and decouple it into multiple convex sub-problems. By adopting ADMM to solve these subproblems, we combine the estimated results and obtain the final predicted FSFT HSIs.
Different from past hyperspectral/multispectral data fusion methods, HSTSR exploits the advantage of having the additional temporal dimension and constructs a novel acceleration framework to reduce the redundantly computed matrices and scalars, which are induced if fusion is conducted at the image pair level. In other words, applying conventional spatial/spectral fusion methods on individual image pairs generates similar transitional matrix and scalar values. Therefore, due to the accessibility over a certain time span, HSTSR is able to perform overall endmember estimation and only estimate the changes in abundance maps to greatly reduce the computation memory and time. Furthermore, by utilizing the abundance maps estimated in the previous times, HSTSR is able to converge faster.
In summary, the innovations of this work can be listed as follows.
1) Unlike previous methods, a new input data model is constructed in this paper. Considering hyperspectral remote sensing images, this model requires much fewer inputs than the original STSR problem, which creates a wider range in selecting satellites for image fusion. 2) We propose a new gradient descent-ADMM cooperative algorithm. Such a conceptual extension can provide insight into future iterative algorithm co-operations to solve multiple variables in one equation. Furthermore, more complex fusion models can be investigated and solved due to stronger algorithms.
3) By exploiting the additional temporal dimension, we develop an acceleration technique that reduces the computational time and memory. In addition, this technique also allows HSTSR to attain a fast convergence by choosing the appropriate input values.
The remaining parts are organized as follows. In Section II, we introduce the unmixing model used in our hyperspectral STSR framework and elaborate on the procedures to solve it, including closed-form solutions and algorithm pseudocodes. In Section III, experiment results performed over different landscape and image time intervals are displayed. In this paper, considering the availability of reference images, we conducted our experiments on simulation data, which is generated from the Hyperion push-broom type imager aboard NASA's Earth Observation-1 (EO-1) satellite. We compare the proposed method with Fit-FC, STARFM, SFSDAF, RASDF, and OBSTFM due to the lack of direct hyperspectral STSR methods in the current literature. Results show that despite the significant reduction in the amount of input data, HSTSR is still able to achieve a comparable, even superior performance and consumes much less computational time compared to the extended peer methods. Finally, we conclude the paper and suggest potential future works associated with hyperspectral STSR in Section VI.
Standard notations used in this paper are collectively presented hereinafter. R n , R m×n , and R m×n + , denote, resp., the set of real-valued n-dimensional vectors, the set of real-valued m-by-n matrices, and the set of real-valued m-by-n matrices with non-negative entries. The symbols 1 m×1 , 0 m×n and I m represent, resp., an all-one column vector of size m, an allzero matrix of size m-by-n, and an identity matrix of size m-by-m.
c j , and [·] ij denote, resp., the ith row, the jth column and the entry at the ith row and jth column of a given matrix. The p -norm and Frobenius norm are denoted by · p and · F , resp. vec(·) denotes the vectorization operator. ⊗ denotes the Kronecker product. is the componentwise inequality.
[·] + denotes the orthogonal projection onto nonnegative orthant of the Euclidean space. For vectors v and u, max{v, u} returns a vector whose ith entry is the maximum value of [v] i or [u] i . ''\'' denotes the set difference. ∇ denotes the gradient operator. I + (z) is the indicator function, if one of the elements in z < 0, then I + (z) = ∞, else is set to zero. ·, · denotes the inner product operator.

II. SIGNAL MODEL AND PROBLEM FORMULATION
First, we construct the signal model of the fusion problem [30] (cf. (1), (2)). The HSIs and MSIs are both obtained VOLUME 10, 2022 originally as 3D data cubes. For better representation and calculation, we reshape 3D data cubes into 2D arrays [30]. To acquire the fine spatial resolution hyperspectral image Z ∈ R M ×L , one way is to fuse the coarse spatial resolution hyperspectral image X ∈ R M ×L x with the fine spatial resolution multispectral image Y ∈ R M y ×L , where M and L x (resp., M y and L ) denote the number of spectral bands and the number of pixels in X (resp., Y ), resp., with M > M y bands and L x < L pixels. Then, the relationship between them can be formulated as where B ∈ R L×L x is the spatial blurring matrix, D ∈ R M y ×M is the spectral downsampling matrix, and E x and E y are the residuals [33]. Furthermore, we introduce the linear mixing model (LMM) [34] linked to our work. We assume that Z can be factorized by the endmember matrix A ∈ R M ×q and the corresponding abundance map S ∈ R q×L , where N denotes the number of endmembers (i.e, materials). In detail, for each index n, each pixel pillar [Z] c n ∈ R M can be factorized by a linear combination of endmembers To conform to their physical meanings, both A and S are assumed to be nonnegative [35], [36], with S further following the sum-to-one assumption [31] (i.e., 1 1×q S = 1 1×L ).
After introducing the fusion problem and LMM, we consider the temporal dimension and build the model to formulate the STSR problem. We have two sets of images described below: 1) The first set is a series of CSFT HSIs, indexed by X 1 , . . . , X T , where T denotes the number of time series.
2) The second set is a series of FSCT MSIs, indexed by Y 1 , Y T . In our model, we aim to obtain FSFT HSIs by combining the spatial, spectral, and temporal information in the mentioned CSFT HSIs and FSCT MSIs. The hyperspectral STSR problem can be decoupled into three subsections: 1) Detailed in Section A, instead of using the priori D and B information directly [31], the complete pairs of images (i.e., (X 1 , Y 1 ) and (X T , Y T )), are used to estimate the spectral downsampling matrix D and the spatial blurring matrix B. 2) Detailed in Section B, the widely used linear interpolation [37] and FISTA are applied to estimate the missing FSCT MSIs on the intermediate time series 3) Detailed in Section C, we fuse each image pair (i.e., (X i , Y i )) to obtain FSFT HSIs Z i , ∀i ∈ I T .

A. SPECTRAL DOWNSAMPLING AND SPATIAL BLURRING MATRIX ESTIMATION
In this section, we estimate the spectral downsampling matrix D and spatial blurring matrix B iteratively. D is estimated via ADMM [29], while B is estimated with the gradient descent method with momentum algorithm [38]. First, by [30], B can be decoupled as where is a vectorized Gaussian blurring kernel with variance v [31], [39], [40], r = √ L/L x denotes the spatial subsampling factor, and L denotes the permutation matrix which transforms an image into block-wise patches [30]. At the preprocessing step, we multiply Y i , ∀i ∈ I T with L (i.e., B can be simplified as I L x ⊗ g), enabling us to reduce the problem size significantly instead of considering the intractable structure of L .
By further degrading (1) and (2) into coarse spatial resolution MSIs, we obtain: Then, by (4) and the complete pairs of images, i.e., X i and In our model, each multispectral band is an average of a set of hyperspectral bands. Therefore, φ(D) is the regularizer to make the spectral responses uniform [7], [41], and it is defined as where λ D is the regularizer parameter. d s contains the non-zero spectral responses in the sth row of D, and P s calculates the differences among the entries in d T s . The explicit form of P s is defined afterwards for conciseness. We assume without loss of generality (w.l.o.g.) that the rows of D are independent, then (5) can be decoupled into M y subproblems. Each d s matches the combination of multiple bands in X i with the corresponding single band in Y i . By decoupling the problem, we can significantly reduce the problem size by not considering the many zeros in the staircase-like D. Then, (5) can be efficiently transformed into (6), which can be solved by ADMM [29]. Update z j+1 by (8); 8: Update u j+1 by (9); 9: j := j + 1; 10: until the predefined stopping criterion is met.
h ∈ R M y ×1 and r ∈ R M y ×1 contain the beginning indices and ranges of the sets of hyperspectral bands corresponding to each multispectral band, resp.
, whose bands correspond to the bands considered in the sth subproblem.
The associated augmented Lagrangian [42] of (6) with the dual variable u and penalty parameter c is Then, the closed-form solutions of the primal and dual variables can be iteratively updated as The derivation of z j+1 follows the Karush-Kuhn-Tucker (KKT) conditions in [42]. At last, the ADMM algorithm for solving D is summarized in Algorithm 1.
Then, we input initial v and the estimated D into the gradient descent with momentum algorithm [38] to update v, as demonstrated in Algorithm 2. First, we illustrate the gradient descent algorithm by defining a function f (x) = y, and there exists a global minimum y of f . The algorithm aims to find x = x such that f (x ) = y by calculating the gradient ∇f (x) iteratively to update x until x = x . The momentum property utilizes the concepts in physics. If a ball rolls down a slope, it will speed up and overcome dents or hills on the slope, which correspond to algorithm acceleration and prevention of local minimum convergence, resp.
Here, D is now regarded as a matrix consisting of constant elements, and we aim to update v by minimizing the error function e(v) (i.e., minimize the difference in (4)), which is designed as We calculate the gradient g ∇e(v) to update v, then D is estimated by both the updated v and (5) again. Finally, after several iterations, we will acquire the optimized B and D.
However, the extremely small g caused by the degraded images (i.e., DX 1 , Y 1 B, DX T , and Y T B) leads to a low convergence speed. To overcome this problem, we normalize the gradients by dividing them with the magnitude of the first calculated gradient, g 1 . This makes the convergence speed insensitive to degraded images, which can enhance the robustness of the algorithm. Thus, for the kth iteration, we denote the normalized gradients g k n as where g 1 n = −1 comes from the fact that we assume the optimal variance is larger than the initial variance. This can be calibrated after a few iterations if the assumption proves false. P is the predefined iteration number (P:= 300 iterations in this work). We are now able to explain how v is updated with momentum. The variance step size v k+1 on the (k + 1)th iteration is defined as The first term is the so-called momentum, and the second term is a negative gradient multiplied by a learning rate. The negative sign means that the evolution of the step sizes should be the opposite of the gradients' directions, which is the essence of gradient descent methods. The momentum is composed of the momentum term γ [38] and v k , where γ ranges from 0 to 1 and decides how much v k should be propagated into the next iteration. η is the learning rate of the algorithm, ranging also from 0 to 1. As can be seen, if g k n is negative, v k+1 will increase, by contrast, if g k n is positive, v k+1 will decrease. Both phenomena make v converge to an optimal value, regardless of the initial v. If the direction of momentum is the same as −g k n , then v k+1 1 will increase more, resulting in a faster algorithm. Therefore, it can be seen that the momentum term can adjust the step size (i.e., convergence speed) dynamically. Finally, the general D-B estimation algorithm is summarized in Algorithm 3.

B. FSCT MSIs ESTIMATION
We estimate the missing FSCT MSIs by defining Update g k n = −1 (cf. (11)); 5: else if k > 1 then 6: Update Algorithm 3 General Structure of the Proposed Method in Section II-A 1: Given X 1 , Y 1 , X T and Y T , η, γ , and v 1 . Update v k+1 by Algorithm 2; 7: k := k + 1; 8: until the predefined stopping criterion is met. 9 where N yi ∈ R M y ×L are the residuals that we want to estimate, and Y i,in ∈ R M y ×L are generated by linearly interpolating Y 1 and Y T [37]. Then, (4) and (13) implies that Therefore, the N yi estimation problem can be defined as min N yi The regularizer ψ(N yi ) is designed as where λ N is the regularizer parameter and N yij ∈ R M y ×r 2 is the jth blurring patch of N yi . P N ∈ R r 2 ×(0.5 r 2 (r 2 −1)) is the sum-of-squared-distances (SSD) regularizer matrix [30], and ψ(N yi ) prevents N yi from overfitting the first term in (14). It preserves the spatial details (sharp discontinuities) in Y i,in by encouraging the uniformity of entries in N yij . Then, (14) is decoupled into L x subproblems min N yij Note that (16) contains a non-differentiable 1 -norm with no constraint, which cannot be handled by ADMM. To solve the problem with the 1 -norm term, we apply the widely used FISTA [32]. FISTA is an effective method which can handle the following problem form: where f (x) is a differentiable function and p(x) is a nondifferentiable function. Then, we let f (n yij ) = 1 , and p(n yij ) = n yij 1 , which leads to with the proximal operator where n 0 = t∇ n yij f (n yij ). Then, the explicit solution of (17) is with k = 1, . . . , M y r 2 . FISTA for estimating missing FSCT MSIs is summarized in Algorithm 4.

C. FUSION OF THE CFST HSIs AND THE FSCT MSIs
In this section, we aim to fuse the complete image pairs in each time series. By (1), (2), and the LMM, the FSFT image Z i estimation problem is formulated into the problem of estimating the endmember matrix A i and the corresponding abundance map S i , i.e., where w 1i , w 2i are the data fitting parameters, and the problem is based on the coupled nonnegative matrix factorization (CNMF) criterion [31]. To avoid redundant computation, we replace S i with the known information S i−1 and a change matrix S i ∈ R q×L (i.e., S i = S i − S i−1 ). In addition, S i can be further decoupled into −S mi + S ai , where S mi 0 q×L (resp., S ai 0 q×L ) denotes the negative (resp., positive) elements in S i . Therefore, S i can be replaced by Update n k yij by (17); 9: while f (n k yij ) > f (n yij ) + ∇ n yij f (n yij ), n k yij − n yij + Update n k yij by (17); 12: end while 13: k := k + 1; 14: until the predefined stopping criterion is met.

15:
Output n k−1 yij . 16: and we only need to deal with the abundance change in S mi and S ai . Hence, (18) is reformulated as where the first constraint suggests that entries in S i are non-negative [35], [36]. The second constraint is specially designed to deal with the possible over-estimations of nonzero elements in S mi and S ai . To be specific, the second constraint provides an upper bound, which is S i−1 , for S mi . Two regularizers φ 1 (A i ) and φ 2 (S mi , S ai ) are defined respectively as where λ A , λ S , λ m , and λ a are regularizer parameters. φ 1 (A i ) is the SSD regularizer [30] to calculate the sum of distances among the endmembers. This regularizer can minimize the volume of the simplex formed by the endmembers in A i , which leads to high-fidelity endmember estimations [43]. Then, we rewrite (20) as (21), where P ∈ R q× q(q−1) 2 is defined in (6) with different dimensions. The first term in φ 2 (S mi , S ai ) is to promote S i with the sum-to-one property [31]. 1 -norm, the convex envelope of 0 -norm [42], aims to promote the sparsity in S mi and S ai as shown in the last two terms of φ 2 (S mi , S ai ). Note that Thus, minimization of the right equation leads to the minimization of the left. The 1 -norm of S i−1 can be neglected due to the fact that S i−1 is regarded as a constant at time i.
Due to the spectral changes [44] among X i , ∀i ∈ I T , endmembers also change throughout time and can differ in notable scales. To overcome this problem, it can be straightforward to define a matrix X tot = [X 1 , . . . , X T ] as the input to estimate the initial endmember matrix A i by the Hyper-CSI algorithm [34]. Columns in the output initial A i form a simplex enclosing all the data points in the q-dimensional space. However, by the observation that adjacent X i can be similar, their formed data cloud in the dimension reduced subspace can also be similar. This induces redundant computations since the other similar data cloud does not significantly contribute to endmember matrix estimation, i.e., excessive columns of data inside the simplex are inputted into HyperCSI. Thus, we define a similarity index F i to determine whether the image pair at time i (i.e., (X i , Y i )) is similar to its previous image pair (i.e., (X i−1 , Y i−1 )). The similarity F i , ∀i ∈ I T \ {1}, is calculated as Through F i , (19) can be split into two scenarios by a threshold F to speed up the algorithm. In detail, this speeding up process can be split into two parts in each scenario. 1) Scenario 1: For the first part, if F i ≥ F, it implies that (X i , Y i ) and (X i−1 , Y i−1 ) are not alike. This means that X tot needs to include X i . For the second part, since (X i , Y i ) and (X i−1 , Y i−1 ) are not alike, information of the previously estimated abundance map cannot bring us closer to the optimal solution of the present abundance map, and S i−1 is thus set to 0 in (19). By also setting S mi to 0, S ai can represent w.l.o.g. to be S i . Considering the two parts, we can now skillfully adopt the HyperCSI algorithm [34] to not only estimate endmembers containing the necessary X i data cloud, but also obtain an initial S i to bring us closer to its optimal solution. 2) Scenario 2: For the first part, if F i < F, it implies that (X i , Y i ) and (X i−1 , Y i−1 ) are alike. This means that X tot does not need to include X i . For the second part, since (X i , Y i ) and (X i−1 , Y i−1 ) are alike, information of the previously estimated abundance map can actually bring us closer to the optimal solution of the present abundance map than the initial S i estimated by HyperCSI. VOLUME 10, 2022 Considering the two parts, we can now skillfully neglect adopting HyperCSI to avoid redundant computations. Subsequently, we estimate A i , S ai , and S mi with the known S i−1 as the initial S i . As a further note, three reasons are provided to explain the need for separately estimating S mi and S ai instead of directly estimating S i : 1) In terms of algorithm speed, if we directly estimate S i , there exists a proximal operator of the nondifferentiable 1 -norm without a closed-form solution, leading to a time-consuming algorithm. In contrast, by handling the 1 -norm term with the nonnegative constraint [30], closed-form solutions can be derived for S mi and S ai . 2) In terms of algorithm convergence, strong duality does not hold if we directly estimate S i 0 q×L , since it violates the KKT conditions due to the nondifferentiable 1 -norm term [42,Chapter 9]. On the contrary, since S mi and S ai exist in the nonnegativeconstrained domain, closed-form solutions that satisfy the KKT conditions can be derived.
3) Since strong duality does not hold when estimating S i , it induces the possibility that the sparsity of the abundance map decreases throughout time as denser map changes are added repeatedly. However, by decoupling S i into S mi and S ai , one can guarantee that S mi and S ai are sparser than S i , with worst case sparsity equal to S i−1 (proved in Appendix). We now deal with scenario 1. Since it is a biconvex problem for A i and S ai , we first solve A i as By utilizing the property of the Kronecker product vec(AXB) = (B T ⊗ A)vec(X), we transform (24) into (25), which can be handled by ADMM, with a i =vec(A i ), since I T Mq I Mq is invertible [42, Lemma 9.2], the algorithm convergence is preserved. The associated augmented Lagrangian [42] of (25) with the dual variable d and penalty parameter c is k := k + 1; 8: until the predefined stopping criterion is met. 9 The closed-form solutions of the primal and dual variables are derived as follows Secondly, we solve S ai , and the formulation is as follows Let s ai =vec(S ai ), then we reformulate (29) into (30), which can be handled by ADMM where x ia , y ia and s m i−1 are the vectorized versions of The associated augmented Lagrangian [42] of (30) with the dual variable d and penalty parameter c is  k := k + 1; 8: until the predefined stopping criterion is met. 9: Output S j+1 ai .
The closed-form solutions of the primal and dual variables are derived as follows . The algorithm of S ai estimation is summarized in Algorithm 6.
Next, we deal with scenario 2, which contains two biconvex problems, one for A i and S ai , S mi , while the other for S ai and S mi . Thus, this scenario consists of the estimation of three variables: A i , S ai , and S mi . The estimations of A i and S ai directly utilize (24) and (29), resp. For solving S mi , we have Let s mi =vec(S mi ), then we reformulate (34) into (35), which can be solved by ADMM min s mi ,z∈R qL×1 Notice that by letting the primal variable z = −s mi , (35) is similar to (30), which enables the matrix reusing scheme, i.e., certain matrices associated with the explicit solutions of S ai can be directly utilized in the explicit solutions of S mi , which reduces computational time. Moreover, the algorithm convergence is also preserved since (−I qL ) T (−I qL ) is invertible [42,Lemma 9.2]. The associated augmented Lagrangian [42] of (35) with the dual variable d and penalty parameter c is Then, the closed-form solutions of the primal and dual variables are derived as follows The algorithm of S ai estimation is summarized in Algorithm 7, while the entire fusion framework of Section II-C is summarized in Algorithm 8.

III. EXPERIMENTAL RESULTS
In this section, the proposed HSTSR method and five extended peer methods are compared to measure the similarity between the fusion results and ground-truth references.
In Subsection III-A, we give some criteria for data acquisition. Detailed information of the datasets is introduced, including data sources, preprocessing, and simulation data generation. In Subsection III-B, we give reasonable explanations of the parameters setting in the HSTSR. In Subsection III-C, we show two datasets considered in this paper and the fusion results of HSTSR with and without noise.  k := k + 1; 8: until the predefined stopping criterion is met. 9: Output S j+1 mi .

Algorithm 8 Framework of the Proposed Method in
Section II-C 1: Given X i , Y i , ∀i ∈ I T , B, and D. 2 j := j + 1; 12: until the predefined stopping criterion is met. 13: Then, we show the fusion results of the proposed HSTSR with other peer methods and discuss their visual qualities. In Subsection III-D, the fusion results are evaluated with widely used quantitative indices.

A. DATA ACQUISITION
Since different satellites have their own calibration standards, non-linear effects can be induced into fusion practices. Thus, we test our algorithms on simulation data, which is generated from one satellite. This can reduce the non-linear effects, thereby validating the CNMF criterion. However, even just obtaining data which possesses satisfactory qualities is relatively hard (one of the problems this paper aims to overcome), since the following criteria must be met: 1) The area under study should not be heavily fogged or cloud-masked throughout the time span.
2) The acquisition date interval of the studied area should be similar between each image.
3) The studied area should have visible changes in the studied time range. 4) The studied area should contain a variety of scenery (buildings, farmlands, vegetation, etc.), so that more ground materials can be included to validate the functionality of the STSR algorithms. To mitigate such harsh conditions, we focus on inland areas in subtropical/savanna tropical regions where clouds and moisture do not easily accumulate.
For satellite selection, we choose the very commonly studied Hyperion push-broom type imager aboard NASA's Earth Observation-1 (EO-1) satellite. The acquired FSFT HSIs (i.e. Z i , i ∈ I T , T := 4 in our work) in two datasets are obtained from the well-known USGS Earth Explorer [45]. Dataset 1 is acquired over Bakersfield, USA, dataset 2 is acquired over Doña Ana, USA, and dataset 3 is acquired over Mongu Nalolo, Zambia. The time intervals of FSFT HSIs in the three datasets are approximately set to 1, 9, and 2 months resp., whose position coordinates and corresponding dates are detailed in Table 1. Note that in the third dataset, while the image at April 16, 2014 is available, its average value is coincidentally very close to zero, making the SFSDAF algorithm generate errors. Thus, the image at April 29, 2014 is chosen for input. Then, we use the ENVI 5.3 software to preprocess the HSIs, images are bad-band-removed (1-10, 104-116, 152-170, 215-224) [46], destriped, radiometric calibrated, atmospherically corrected, and normalized to reduce sensor errors [47]. After preprocessing, each HSI contains M = 163 spectral bands (ranging from 427 to 2355 nm), and L = 204 × 204 (first two images), L = 192 × 192 (last image) pixels. The EO-1 datas are co-registered via three Landsat 8 OLI images, which also went through radiometric and atmospheric corrections. The CSFT HSIs (i.e., X i , i ∈ I 4 ) are generated by multiplying a Gaussian blurring kernel with the variance v = 3 and the spatial subsampling factor r = 6. Each Y i , i ∈ {1, 4} is generated by averaging band regions corresponding to Landsat 8 OLI bands 1-7 (covering 437-447, 457-508, 539-590, 641-661, 854-875, 1578-1648, and 2113-2285 nm regions, resp.). In addition, to examine the robustness of the algorithm, a 30 dB signal-to-noise ratio (SNR) additive Gaussian independent and identically distributed (i.i.d.) noise is applied to the datasets.
Owing to a lack of hyperspectral STSR peer methods to compare with, we extend the usage of the benchmark multispectral STSR methods, including STARFM [12], Fit-FC [14], SFSDAF [17], RASDF [18], and OBSTFM [19], which are introduced in Section I. X i , i ∈ I 4 and Z 1 are used as inputs for the peer methods since all of the peer methods specialize in estimating FSFT images by a CSFT image set and a fine spatial resolution image with identical spectral resolutions.

B. EXPERIMENTAL SETTING
Experimental details are provided and categorized corresponding to Sections II-A, II-B, and II-C. The parameter values are summarized in Table 2. The number of ADMM iterations and penalty parameter c are set to 1 and 5 resp. for all Algorithms [30]. The number of iterations associated with index j in Algorithm 8 is set according to the two scenarios decided by F i (cf. (23)) to balance between algorithm speed and accuracy.
1) In Section II-A, the momentum term γ and the learning rate η are set according to [38]. The regularizer parameter λ D is set such that values do not deviate too much from the optimal results [7]. The first variance step size v 0 is set for balancing the convergence speed and gradient normalization (i.e., the smaller v 0 is, the steeper the gradient descent iterates, however, the more iterations needed for convergence). 2) In Section II-B, the step size t and the backtracking stepsize β are set to reasonable initial values since the Lipschitz constant of (16) is hard to compute [32]. The regularizer parameter λ N is set marginally for overfit prevention. Due to the small (15), we multiply the FISTA inputs component-wisely by a scaling coefficient b to make the gradients in FISTA become feasible for iterations (i.e., prevent the truncation errors in Mathworks MATLAB [48]). In consequence, the gradient step size t, is also scaled in the FISTA framework, making it independent to data properties. After that, the estimated results are then divided component-wisely by b. 3) In Section II-C, the threshold F is set empirically to categorize the estimation of all-time series into an equal number of scenarios for fusion, such that both types can be utilized sufficiently. Through empirical tuning, the outer loop iterations (cf. j in Algorithm 8) of the first and second scenarios are set to 40 and 10, resp. to balance speed and accuracy. Multiplied by the inner two (A i , S ai ) and three (A i , S ai , S mi ) ADMM iterations of scenarios one and two resp., the iteration ratio of scenario one over scenario two is 2.67, i.e., through the threshold criterion, we are able to achieve a 2.67 times faster convergence for the similar image sets. w 1i and w 2i , associated with the data fitting terms, are all set to 1 by default. However, intuitively, if the spatial blurring ratio r is smaller, w 1i should be set larger since X i is now more similar to Z i . By empirical observation, if Y i overlaps more hyperspectral bands, w 2i should also be set smaller (e.g., 0.9) for better performance. Intuitively, q is related to the size of the estimated simplex (i.e., the number of utilized X i ), if more X i are considered, q should be increased to capture the additional non-trivial endmembers. However, this induces a trade-off between performance and computational time, since increasing q enlarges the dimensions of the processed data in HyperCSI. λ A , λ S , λ a , and λ m are set according to [30], which can be modified if users have prior knowledged to their inputs. Finally, all of the experiments are executed on Mathworks MATLAB R2021a, running on a laptop, equipped with a Core-i5-9300H CPU with 2.40-GHz speed and 4-GB RAM.

C. QUALITATIVE ANALYSIS
First, images associated with HSTSR are grouped and shown. Figures 1, 2, and 3 are configured as follows. The first, second, third, and fourth rows contain the CSFT HSIs X i , FSCT MSIs Y i , estimated FSFT HSIs Z i , and the referenced FSFT HSIs Z i , ∀i ∈ I 4 , resp. Images (a) to (d), (c), and (h) are inputs for HSTSR, while images (f), (g), and (i) to (l) are the estimated results.
Then, the fusion results of HSTSR are compared visually with those of the extended peer methods. Enlarged areas can be found on the bottom left corner of these images to further enhance observability. In Figures 4, 5, and 6, dates 2 to 4 correspond to the first to third row in the figures. The first column contains the reference images while the second,  third, fourth, fifth, sixth, and seventh column contains the images estimated by Fit-FC, SFSDAF, STARFM, RASDF, OBSTFM, and HSTSR, resp. By observing the enlarged areas of the three datasets, Fit-FC cannot well preserve the spatial details of the FSFT images. SFSDAF generates abnormal shadows (cf. image (c) to (q) in Fig. 4) and produces blurred estimates in dataset 3. For STARFM, there exist an unnatural patch on the top left corner in each of its results, while other areas do not seem to be re-estimated and remain the same as the base image. Results of RASDF incorrectly estimated the general hue of images. Moreover, in dataset 3, spatial details are scattered and inappropriately reconstructed. Unexpected edges appear in the results fused by OBSTFM, and abnormal patch colors can be observed. For SFSDAF, RASDF and OBSTFM, they cannot obtain good estimates when the base image Z 1 is considerably different from the other images (cf. images (q), (s), and (t) in Possible reasons for the occurrences of the fused results are summarized in the end of the next Subsection.

D. QUANTITATIVE ANALYSIS
We adopt the Wald's protocol [49] to quantitatively measure the results fused by HSTSR and design our experiments. According to Wald's protocol, the results outputted from the proposed HSTSR should be compared with the reference images to measure their similarity. The standard quantitative indices [30], [50], [51] include peak signal-tonoise ratio (PSNR) for spatial quality evaluation, spectral angle mapper (SAM) for spectral quality evaluation, rootmean-squared error (RMSE), erreur relative globale adimensionnelle de synthèse (ERGAS), and universal image quality index (UIQI) for global quality evaluation.
The quantitative indices of the proposed HSTSR and the peer methods are summarized in Tables 3, 4, and 5. Boldfaced indices in the tables represent the best values for each quantitative index (i.e., highest PSNR/UIQI, lowest ERGAS/RMSE/SAM, and fastest computational time). Since we estimate 4 FSFT HSIs simultaneously, the computation time is divided by 4 to compare with the peer methods.
In general, we observe that HSTSR has the best/secondbest performance for the indexes, indicating that under a 30 dB noise interruption, the algorithm can still handle hyperspectral STSR well. In the Tables, although SFSDAF has a slightly better performance than HSTSR in some quantitative indices at dates 2 and 3, they cannot compensate for the substantial time reduction in HSTSR. Moreover, the superior results of SFSDAF differ in a small amount compared to those in HSTSR, while a notable difference can be observed otherwise.
In terms of the amount of information, the fine spatial resolution image (i.e., Z 1 , 163 bands) available to the peer methods is 11.64 times larger than the FSCT MSIs used 37488 VOLUME 10, 2022  in our proposed method (i.e., Y 1 and Y 4 , with a total of 14 bands). In terms of time reduction, the proposed method is significantly faster than the peer methods, since we neither require band-wise computation, nor generate weighting coefficients associated with sliding windows. To quantify the time improvement of the proposed HSTSR, the total time needed for estimating all the FSFT HSIs by each peer method is divided by that of the HSTSR's. These ratios are summarized in Table 8. In the meantime, we have also successfully estimated Z 1 , B, and D, which are known as priori in the  spatial/spectral peer methods and in [30], [31], resp. The robustness of B and D estimation (cf., Section II-A) can be indicated by the results in Table 7, where the performances (calculated by averaging the percentage errors of the estimated and v and D) without noise are included. As can be seen, although errors increased when noise is present, they still remain in a 1% range, indicating that the newly proposed gradient descent-ADMM cooperative algorithm is robust and accurate.
Next, possible reasons for the inferior estimated results (peer methods and the proposed HSTSR included) are provided. For Fit-FC, although it can capture seasonal changes very well [52], it is based on the assumption of stable land cover boundaries for MSIs, which is not suitable for HSIs,   since a multispectral band spreads unevenly among several hyperspectral bands. For SFSDAF, due to the fact that a number of customized assumptions are made prior to the implementation of endmember and abundance map estima-tions, they are not as robust as HyperCSI and ADMM. For STARFM, since areas do not necessarily exhibit equal texture densities, and edges of endmembers in a local search window can be heterogeneous, it is difficult for STARFM   to search for similar pixels in a hyperspectral image. For RASDF, the reliability index and threshold coefficient are computed band-wisely to acquire reliable pixels that do not suffer significant system distortions while experiencing large changes. However, for hyperspectral images, pixel information is widely distributed over the bands, and band-wise evaluation cannot well capture the reliable pixels. For OBSTFM, since features are distributed unevenly among the bands, data segmentation can lead to wrongly classified pixels, inducing abnormal object edges and patch colors.
The main shortcoming of our algorithm lies in the relatively poorly-estimated Y 2 and Y 3 in Section II-B. Table 6 indicates the fused results on Y 2 and Y 3 . As can be seen, results of Dataset 3, Date 2 are the worst, which corresponds to the inferior performance of HSTSR compared to the peer methods in Table 5. This phenomenon is reasonable, since only degraded images are available in the data fitting term of (14). The blocky artifacts are induced due to the uniform residual regularizing term in (14). Despite its ability to preserve spatial details, the side effect is that patches are levitated differently, resulting in discontinuities in the estimated results. In addition, we observe that the performance of Y 2 and Y 3 decreased slightly when the time interval increased, since the seasonal changes are blurred in X 2 and X 3 . As can be seen, the PSNR in the second dataset is smaller than the first dataset due to the above reason.

IV. CONCLUSION
To overcome the trade-offs in hardware and reduce costs, spatial/spectral and spatial/temopral super-resolution have been extensively studied by many researchers. This paper introduces an optimization-based three dimensional hyperspectral STSR to obtain FSFT HSIs, which is seldom investigated by past methods. In this paper, we propose a new gradient descent-ADMM cooperative method to generate the previ-ously deemed priori matrices automatically, and we design an abundance-map-based change matrix estimation scheme to achieve fast hyperspectral STSR.
In terms of performance, despite noise interruptions, the auto-estimation errors were within 1% (cf. Table 7). Results in Tables 4, 5, and 6 show that the proposed HSTSR produces comparable (cf. dates 2 and 3), even more accurate results (cf. dates 1 and 4) than the extended multispectral peer methods. Moreover, HSTSR only needs 11.64 times fewer amounts of input data than the peer methods, and similar image sets can have a 2.67 times faster convergence (cf. Section III-B, enumeration 3) in HSTSR. Total computational time performances compared to the peer methods are summarized in Table 8. In addition, HSTSR neither makes any assumptions on the input images nor computes the band-wise weights, which is time-consuming and the weights are only used once.
For future works, we will continue to investigate the frameworks and methods associated with hyperspectral HSTSR. The cooperation of iterative algorithms provides and enables more challenging data structures to be solved and designed. Hybrid models such as the integration of classical-and optimization-based methods also provide interesting prospects towards hyperspectral STSR. The trade-off between preserving spatial details and avoiding blocky artifacts observed in the experiments also needs further improvement, such as designing more complex and robust regularizers. In addition, regularizers in our mathematical formulations can be substituted by more customized versions, which can include algorithms that specifically deal with hyperspectral images, such as image destriping, denoising, and registration error compensation. By transforming these algorithms into minimization functions, the plug-and-play scheme can be implemented based on the proposed framework.

A. PROOF OF GUARANTEED SPARSITY OF THE CHANGE SUB-MATRICES
Assume w.l.o.g. that we are at time i, and there exist two abundance maps S i and S i−1 , all having sparsity. During transition from one map to the other, the elements inside the maps can only undergo three scenarios, and by observing that the change sub-matrices S ai and S mi deal with adding and subtracting values resp., their relationship with the scenarios are summarized in Table 9. As can be seen, each matrix possesses value in one out of three scenarios. If we consider the worst case that S i−1 and S i have complementary values (i.e., if one element is zero in S i−1 , then the corresponding element in S i is not zero), making the total transition sparsity equal to zero, the change sub-matrices can still maintain their sparsity to be equal to S i−1 due to the split in transformation values. For each modification in the maps, only one element will obtain value in the change sub-matrices. This phenomenon guarantees sparsity, whereas if we consider the S i case, the sub-matrices add up and deteriorate the sparsity of S i , which can be seen by combining the columns in Table 9. He is currently pursuing the Ph.D. degree with the Intelligent Hyperspectral Computing Laboratory, Institute of Computer and Communication Engineering, National Cheng Kung University, Tainan, Taiwan. His research interests include deep learning, convex optimization, tensor completion, and hyperspectral imaging.
Mr. Tang was a recipient of the Outstanding Paper Award from the Chinese Image Processing and Pattern Recognition Society (IPPR) Conference on Computer Vision, Graphics, and Image Processing (CVGIP), in 2020.
YANGRUI LIU (Student Member, IEEE) received the B.S. degree from the Department of Communications Engineering, Feng Chia University, Taiwan, in 2019. He is currently pursuing the Ph.D. degree with the Intelligent Hyperspectral Computing Laboratory, Institute of Computer and Communication Engineering, National Cheng Kung University, Taiwan. His research interests include convex optimization, deep learning, and hyperspectral imaging.