Double Prior Network for Multidegradation Remote Sensing Image Super-Resolution

Image super-resolution (SR) is widely used in remote sensing because it can effectively increase image details. Neural networks have shown remarkable performance in recent years, benefitting from their end-to-end training. However, remote sensing images contain a variety of degradation factors. Neural networks lack flexibility in dealing with these complex issues compared with reconstruction-based approaches. Traditional neural network methods cannot take advantage of prior knowledge and lack interpretability. To develop a flexible, accurate, and interpretable algorithm for remote sensing SR, we proposed an effective SR network called YSRNet. It is performed by unfolding a traditional optimization process into a learnable network. Combining conventional reconstruction-based methods and neural networks can significantly improve the algorithm's performance. Since the gradient features of remote sensing images contain valuable information, the total variation constraints and the deep prior constraints are introduced into the objective function for image SR. Furthermore, we propose an enhanced version called YSRNet+, which can apply attention weights to different prior terms and channels. Compared with the YSRNet, the YSRNet+ enables networks to focus more on useful prior information and improve the interpretability of networks. Experiments on three remote sensing datasets are performed to evaluate the algorithm's effectiveness. The experimental results demonstrate that the proposed algorithm performs better than some state-of-the-art neural network algorithms, especially in the scenario of the multidegradation factors.


I. INTRODUCTION
R EMOTE sensing images are widely used in target recognition and detection [1], [2], [3], [4], [5], [6], land classification [7], [8], [9], [10], [11], [12], resource exploration [13], etc. However, due to remote sensing images mostly covering a large area of the ground, the image resolution is usually low, degrading real-world applications' performance. Such problem can be improved by two aspects: hardware [14], [15], [16], [17] and algorithms [18], [19], [20], [21]. Considering the sharp increase in cost by improving resolution through hardware, it is more widely used to optimize algorithms for remote sensing images super-resolution (SR). The original remote sensing images degrade to varying degrees due to natural noise, motion blurring, and hardware limitations, resulting in loss of information [22], [23], [24]. These three degradation factors constitute a task of multidegradation image SR. Image SR aims to reconstruct a high-quality image from one or more low-quality images. SR algorithms need to restore as much detail as possible and present target information. This process is the inversion of image degradation. Degradation models can be represented as where A denotes the comprehensive effect of the downsampling matrix and the blurring kernel. x and y represent high-quality images and low-quality images, respectively. n is white Gaussian noise.
Remote sensing image SR is divided into single-image SR (SISR) [25], [26], [27] and multi-image SR (MISR) [28], [29], [30]. MISR algorithms utilize the relationships between multiple images to obtain additional information. Neural networks have also been utilized for MISR. DeepSUM is a deep neural network for unregistered multitemporal images SR, which exploits both spatial and temporal correlations to combine multiple images [31]. Liu et al. proposed a novel MISR network called progressive multiscale deformable residual network. It aimed to improve the spatial resolution of sea ice passive microwave images, according to the characteristics of both passive microwave images and sea ice motions [32]. However, compared with MISR methods, SISR algorithms are more widely used. It is a more direct and effective image quality improvement method with no need for other information. SISR can be divided into interpolation-based, reconstruction-based, and learning-based methods. Interpolation-based methods include nearest neighbor interpolation, bilinear interpolation, cubic interpolation, etc. Some scholars proposed improved interpolation-based SR algorithms such as [33], [34], [35].
Reconstruction-based methods aim to study the degradation process from high-resolution (HR) images to low-resolution (LR) images. For instance, Li et al. proposed maximum a posteriori based on a universal Hidden Markov Tree (HMT) model for remote sensing image SR [36]. The HMT theory sets up a prior model to reconstruct SR images from a sequence of warped, blurred, subsampled, and noise-contaminated LR images. And Schultz et al. [37] proposed a Bayesian approach method to reconstruct LR images by maximum a posteriori estimation.
Learning-based methods are mainly based on sparse representation and neural networks. Yang et al. [38] first proposed an image SR method based on sparse representation. This method establishes the relationship between LR and HR images for image SR by learning a redundant dictionary. Then, Zheng et al. [39] first applied the sparse representation to remote sensing image SR. After that, Hou et al. [40] proposed a sparse representation and global union dictionary model. It utilizes nonlocal self-similarity to obtain global constraints and improve the performance of image SR.
In recent years, neural networks have been widely used for image SR. Dong et al. [41] introduced a convolution neural network into image SR in 2015 and achieved good results. Liebe et al. [42] proposed SRCNN network for remote sensing image SR. Furthermore, Dong et al. [43] proposed the FSRCNN based on SRCNN. After that, Lei et al. [44] suggested the LGCnet. The LGCnet exchanges local and global information by cascading shallow and deep feature mappings. Since He et al. [45] proposed residual network, it has been widely used for SR tasks. Kim et al. [46] proposed VDSR algorithm, which solves the problem that deep networks are challenging to train. For remote sensing images, Haut et al. [47] proposed an attention mechanism called RSRCAN, which can improve network performance with a small number of calculations.
Recently, an unfolding method combining traditional optimization algorithms with neural networks has been proposed for image reconstruction [48], [49], [50], [51], [52], [53]. It unfolds the traditional optimization process into a network, and the parameters in the algorithm can be obtained through training without artificial design. Zhang et al. [54] proposed USRNet for multidegradation tasks and obtained good results. However, in [54], it only uses the deep prior information for image SR. Luo et al. proposed DAN network for image SR. The algorithm completes kernel estimating and image restoring through an end-to-end learning process, which improves the accuracy of the algorithm.
Different algorithms have their characteristics. Fast reconstruction is the advantage of interpolation-based SR algorithms while the performance is limited. The reconstruction-based methods and sparse representation can fully use the prior information, but they also have the drawbacks of a large number of calculations. The neural network has a good performance. However, it lacks interpretability and is not flexible enough to utilize the traditional experience and knowledge due to its unique structure [54]. The deep unfolding network has a strong fitting ability of the neural network and is flexible to utilize prior knowledge. Thus, the unfolding method can handle multidegradation tasks well. However, as far as we know, the unfolding method has not been widely used in remote sensing. And some hyperparameters and prior information are not well combined with neural networks.
The number of remote sensing images is limited for remote sensing image SR. The interpretability is essential for Fig. 1. Effects of different blurring kernels on images. Column 1-3 denote the isotropic Gaussian kernel, the anisotropic Gaussian kernel and the motion blurring kernel, respectively. remote sensing SR algorithms. Moreover, remote sensing SR contains multidegradation factors, which requires the algorithm to be highly adaptable. To improve the algorithm's flexibility and increase its interpretability, we propose YSRNet for multidegradation remote sensing image SR. Compared with the traditional neural network method for remote sensing image SR, we utilize the variable splitting method to unfold the optimization process into a network, which significantly improves the algorithm's performance while still maintaining its interpretability. In addition, most deep unfolding algorithms have only one prior term. Thus, plenty of the knowledge in traditional optimization algorithms has not yet been utilized. Considering that the gradient domain of remote sensing images contains plenty of details [55], [56], [57], we utilize total variation (TV) priori for image SR. TV feature is essentially the image gradient features along row direction and column direction. For an image, the edge gradient contains more information than the smooth region. Therefore, we extract the gradient information separately, making it easier for the network to obtain and facilitate image reconstruction. And this approach also increases the number of feature maps in the network. The network consists of two interpretable modules, which perform different functions, respectively. TV features are introduced into the network as the prior knowledge. Furthermore, we propose an enhanced version of YSRNet+ by introducing an attention module. Unlike traditional attention modules, this attention module can also assign weights to different prior constraint terms. The attention weights can be interpreted as the penalty factors for additional prior terms in traditional algorithms. It improves the network performance while further increasing the interpretability of the network. It is worth noting that this improvement only requires a small amount of computation. The main contributions of this article are summarized as follows.
1) In this article, we propose a deep unfolding strategy aiming at a new optimization problem, which consists of a reconstruction term, a deep prior term, and a TV prior term. We map the optimization algorithm to a network and build a bridge for TV priori and neural networks. 2) A new network module called Ynet is designed to perform the noise removal task. TV priori are introduced into this module for feature enhancement, which can help the network extract image information more efficiently. Then, by introducing the attention module, we propose an enhanced version, which has a more reasonable explanation for combining the hyperparameters with neural networks. At the same time, it can make the network pay more attention to essential feature maps. 3) Considering various degradation factors, we implement image SR under comprehensive factors, which makes the task more universal. The rest of this article is organized as follows. In Section I, we briefly introduce the degradation models and principles of the deep unfolding networks. We propose the remote sensing SR algorithm in Section II and the enhanced version is described in Section III. Section IV shows the experimental results and analysis. Finally, Section V concludes this article.

A. Degradation Models
In the process of remote sensing image acquisition, the image is usually accompanied by various degradation factors due to the influence of equipment and the environment. This article considers several typical degradation factors, including image blurring, random noise, and image downsampling, to make the SR model realistic. Image blurring is divided into the following three categories: isotropic Gaussian blurring, anisotropic Gaussian blurring, and motion blurring. Motion blurring is caused by the relative motion of the aircraft and ground, and the Gaussian blur is caused by the inaccurate focus of the camera and atmospheric turbulence. Hence, the degradation model can be expressed as: y = (k ⊗ x h ) ↓S + n, where n denotes the white Gaussian noise. The loss function can be expressed as follows: where x h denotes the HR image, k denotes the blurring kernels, and S is the downsampling factor, usually being 2, 3, or 4. y denotes the observed LR image. Fig. 1 shows various kinds of blurring kernels, where (a), (b), and (c) denote the isotropic Gaussian kernel, anisotropic Gaussian kernel, and motion blurring kernel, respectively.
The blurring kernel makes the image structure overlap. The noise can bring fake information, and the downsampling can cause the loss of image details. In this article, we perform image SR under these three degradation conditions. This article contains isotropic Gaussian kernels of various sizes, anisotropic Gaussian kernels of multiple directions, and motion blurring kernels of different tracks.

B. Deep Unfolding Networks
Unfolding methods are the combination of reconstructionbased and learning-based algorithms. The principle of deep unfolding is to decompose complex problems into simple subproblems. Different subproblems are solved independently. It uses traditional methods to solve the subproblems such as deblurring and scaling because these problems have closed-form solutions.
There is no closed-form solution for the noise removal problem, and the noise removal results of traditional reconstruction-based methods are limited. Thus, we utilize neural networks to complete the noise removal task. The calculation process of the closed-form solution and the denoising networks constitute the unfolding network. Each part of the unfolding network has definite functions and meanings. Different closed-form solutions can solve problems under different conditions. Therefore, networks have great flexibility and interpretability. It is suitable for solving multidegradation issues and problems with high interpretability requirements, such as remote sensing image SR.

III. PROPOSED SCHEME
Based on the degradation model in Section II-A, this unfolding network can be divided into two parts, i.e., the establishment of the optimization model and the design of unfolding networks. Most unfolding methods use only one kind of prior information as the regularization term in the objective function. Equation (3) shows the basic unfolding networks, where R(x) denotes the image degradation process and ϕ(x) denotes the prior information of images, such as sparse priori or deep priori. Only using one kind of prior information cannot fully mine the information of the images. Therefore, we propose reconstructing the target image with more than one kind of prior information in the proposed network. The utilization of prior information essentially adds more input features to neural networks. This method improves the network performance and provides a bridge between the traditional prior term and the neural networks It is worth noting that the article aims to solve nonblind remote sensing image SISR. Blurring kernels can be estimated by other estimation algorithms or hardware devices.
In this section, we first introduce the decomposition of the multidegradation problem and the solving methods of the subproblems. Then, this article shows the design of the deep unfolding network framework. Finally, the algorithm flow chart of unfolding networks based on end-to-end training is presented.

A. Optimization Methods
Among traditional optimization algorithms, TV regularization is a remarkable prior term. It utilizes the gradient information of the target images, which can make the results of the SR reconstruction clearer. Combining with (2), the objective function can be expressed as where the first term denotes the reconstruction term, ϕ(x) denotes the deep prior term, and ψ TV (x) denotes the TV prior term.
, where h and w denote the height and width of the image, respectively. α and β are the tradeoff parameters. Since (4) contains implicit terms, we cannot solve it directly. Thus, we can use variable splitting algorithms to solve it such as ADMM [58]. By introducing the auxiliary variables z and m, (4) can be redefined as arg min Therefore, Lagrange function can be defined as , and x − m 2 F denote the penalty functions to guarantee the variable z, m, and x being approximate. μ 1 , μ 2 , and μ 3 are the corresponding penalty parameters, which should be large enough to guarantee the convergence of the algorithm. Here, z and m denote the reconstructed images obtained through different kind of prior knowledge in traditional optimization-based methods. Different priori terms can impose different constraints on the final image. Finally, through the dual rising method, we can make the auxiliary variables close enough to the solution variables to get the final reconstructed images. Such problem can be solved by calculating x-subproblem, zsubproblem, and m-subproblem alternately. Therefore, it can be divided into these three subproblems Equation (6) is decoupled into three subproblems, and the reconstruction items and two regularization terms can be solved separately. Thus, (6) involves three subtasks, i.e., denoising, deblurring, and scaling. Deblurring and scaling is solved by (7), which has a closed-form solution. The solution of (7) is (10) In (10), F (·) and F −1 (·) denote the operation of fft and ifft. ↑ S denotes the s-fold upsampler.

B. Unfolding Network Framework
Inspired by the abovementioned deep unfolding methods, we propose a new network to solve (5), called YSRNet. The overall framework of the network follows the process of variable splitting algorithms. Equation (7) can be solved efficiently by (10). In traditional optimization algorithms, z and m need to be calculated by the threshold shrinking method, such as singular value thresholding [59], [60], [61]. Instead, in this article, we use a network instead of threshold shrinking algorithms to achieve a better reconstruction performance.
The framework of YSRNet under N iterations is shown in Fig. 2. The upper part of Fig. 2 mainly shows the calculation process of the modules. y denotes the degraded image. LR images are the input from the left, and SR images x are generated after N iterations. The algorithm flow strictly follows the iterative process of the variable splitting optimization method. The whole iteration is an end-to-end learnable process. Each layer of the iteration process has different parameters, which makes the network have a strong mapping ability. Since the target reconstruction module is accomplished by (7), the network only needs to perform simple tasks by solving (8) and (9). In traditional optimization algorithms, the effect of the prior constraint is mostly used to remove the noise, so the function of this network is also to perform the noise removal function. As shown in (4), ϕ(x) denotes the deep prior term. The second regularization we selected is the TV term, which can extract the gradient information of the images. Traditional TV regularization can be obtained by multiplying the image by the gradient matrix D x in (11) and then calculate the sum of all values in the ∇ x and ∇ y to get L_1 norm values, which means x TV = ∇x 1 . And the optimization objective is to make the sum minimize for denoising tasks. However, because calculating L_1 norm will lose the structure information. So the gradient matrices ∇ x and ∇ y are used as the network's input. And L_1 norm of TV regularization term becomes an implicit TV norm in deep unfolding network, which can be expressed as ψ TV (x) The gradient of the image along the row and column directions can be obtained by (12). ∇ x and ∇ y denote the gradient features of TV term. Then, the details about the deconvolution module D and the prior module P are provided.
1) Deconvolution (D) Module: D module is a deconvolution module, which is presented as green blocks in Fig. 2. This module is corresponding to (7). The input image is initialized as the nearest interpolation degraded image. Utilizing (10), x-subproblem can be calculated by the last iteration results of P module and related parameters, where μ 1 and μ 2 in (10) can be obtained by back propagation method. It is the solution of y − (k ⊗ x) ↓S 2 F after variable splitting. The input parameters of this module are variables z, m, convolution kernel k, sampling factor S, weight coefficient μ 1 , μ 2 , and the last reconstructed image x k−1 , as shown in Fig. 3. The output of D module is the reconstructed image, which is also the input of P module in the next iteration stage.
2) Prior (P) Module: P module is the prior module, which is dedicated to the noise removal of input images. It acts as a Fig. 2. Illustration of our proposed YSRNet framework. y is the image with degradation. SR denotes the SR image. The green module represents the deconvolution process. The two blue modules represent the noise removal process. Arrows in the network represent data exchange and the network is a fully symmetric structure. denoiser. Inspired by Res-Unet proposed in [62], a new network with multiple feature inputs is proposed in this article, which is called YSRNet. This module corresponds to (8) and (9). It solves z-subproblem and m-subproblem by neural networks. The final results are fused and approximate HR images by end-to-end training. Res-Unet is a combination of res-blocks and U-net. The size of the input images is b × n × s × s, where s denotes the height and weight of images, b and n denote the batchsize and channel number, respectively. This network can be divided into two parts. The first part acts as a feature extractor. The second part acts as a data reconstructor. Since the network structure is Y-shaped, we call it Ynet. For each input feature, the feature extractor extracts different scale features of the image to fully mine the intrinsic information from the image. The feature extraction process is accomplished by using several Res-blocks. Each scale transformation is done by a pooling layer, which can make the length and width of the picture to be half of the original and double the number of channels. The design of the Res-block is the same as the structure in [62], which contains two convolution layers and one Relu layer. The structure of the Res-block is shown in (13). Each residual block connects the input to the output using a skip connection. Taking the scale factor equaling to 2 as an example, the five scales for feature extraction process are 256, 128, 64, 32, and 16, respectively. The number of channels is 16, 32, 64, 128, and 256, respectively. After deep prior and TV prior feature extraction, we merge the two parts. Notably, fusion here refers to channel concatenating. Simply adding two parts may lead to a loss of information. Up to now, the feature extraction of the two channels is completed The second part is the process of upsampling, which is the inverse process of feature extraction. Image scale changes from 16 to 256, and the channel number changes from 256 to 1. The deconvolution layer accomplishes Upsampling. It is worth noting that during the feature extraction and image reconstruction, we use the skip connection to connect layers of the same scale. This operation can exchange information between different layers to mine deep information of the image. It can also retain shallow features. And this information exchange can bring about the improvement of the network performance [63], [64].
The deep prior features and TV prior features are shown in Figs. 4 and 5, respectively. Deep priori is mainly aimed at the extraction of the overall features. And TV priori is mainly aimed at the extraction of the edge features. TV features can effectively enhance the edge information of the image, which can improve the performance of P module.

C. End-to-End Training
This network is obtained by end-to-end training. Each iteration contains one D module and one P module. N iteration  YSRNet network contains N Ynet blocks. The flows of YSRNet and Ynet are shown in algorithm 1 and algorithm 2, respectively. The original HR images are used to calculate the loss functions. Ultimately, SR images can gradually approximate original HR images.

D. Enhanced Version: YSRNet+
Considering different features and channels in YSRNet have the same weight, an enhanced version (YSRNet+) is proposed. Because each convolution kernel of a neural network also contains weight information to some extent. Compared with YSRNet+, YSRNet can also learn the weights for different prior terms to some extent. However, the performance of YSRNet is limited. Attention mechanism as a weight learning method can make the network learn weight information better. We apply an attention module in YSRNet+. By introducing the attention module, we can apply different weights on different prior terms and channels, and thus, the model will focus more on important features. This module can better explain the effect of α and β in (4). The purpose of α and β is to balance the loss between the reconstruction term and the prior terms, which is also the Algorithm 1: YSRNet.

Noise Removal and Images Output
X → Conv(3×3) → Upsampling module I → Upsampling module II → Upsampling module III → Upsampling module IV → X. return Denoised images X function of the attention module. Thus, the attention module can further enhance the interpretability of the network. The attention module can extract more valuable features of the image and give it a larger weight, which is helpful for image reconstruction. The Res-block combined with the attention block is shown in Fig. 6. We map the features of different channels into various weights through a simple network so that the network can learn the importance of the features. The Res-blocks are applied in The function of attention Res-block module is given in (14), where conv, relu, pooling, and full_con denote convolution layer, activation layer, pooling layer, and full connection layer, respectively. Equation (14) is corresponding to Fig. 6. The second term denotes the attention weight, which consists of a full connection layer and a pooling layer. As shown in algorithm 3, the deep priori and TV priori both have n feature maps in the network. According to (14), we can multiply feature maps by the corresponding weight. n 1 w x and n 1 w TV are corresponding to α and β in (4). We concatenate the deep priori feature channels and TV priori feature channels for feature fusion. Then, all the features are used for image reconstruction IV. EXPERIMENT We evaluate the proposed networks' performance using three remote sensing datasets to demonstrate the algorithm's effectiveness.
UC Merced Land-Use dataset is a common aerial dataset containing 21 kinds of scenes and 100 samples of 256 × 256 pixels in each class. The pixel resolution of this dataset is 1 ft. Training can be time-consuming because of too many pictures in this dataset. Therefore, we selected the first ten categories of scenes, each selecting 40 pictures as training samples. Then, for each class, ten pictures are selected as the testing sample to evaluate image reconstruction performance. Ultimately, we use 400 images for training and 50 for testing. For the second aerial dataset, the WHU-RS19 dataset is selected. WHU-RS19 dataset is collected from Google Earth by Wuhan university. It contains 19 categories of physical scenes in the satellite imagery, including airport, beach, bridge, commercial, desert, river, and so on. For the universality of the algorithm, we choose the NWPU-RESISC45 dataset to demonstrate the algorithm's effectiveness. NWPU-RESISC45 dataset is created by Northwestern Polytechnical University, which is a large scale on the scene classes and the total image number. It holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, and so on. For this dataset, we selected the first forty categories of scenes, each selecting 30 images for training and 2 for testing.
Examples of these three datasets are shown in Fig. 7. The experiments are performed under the Pytorch framework, and we train them on NVIDIA Titan RTX GPUs. Adam is selected as the optimizer. For each training sample, the images are cropped to the patches of size 48 × 48 as the input. The learning rate is initialized as 0.0005. The mini-batch size is set to 48. We choose L1 loss for the PSNR performance. It takes about 20 h to obtain the YSRNet model.
In addition to YSRNet and YSRNet+, several advanced SR algorithms are compared, including DPSR [65], IMDN [66], MAN [68], HSENet [69], USRNet [54]. DPSR proposes a deep plug-and-play SR framework to solve the SISR problem. It considers multiple blurring kernels and supports existing deblurring methods for blurring kernel estimation. IMDN proposes a multidistillation module to optimize memory and real-time. MAN is an attention network, which consists of a multiscale large kernel attention structure. HSENet was proposed for optical remote sensing image SR, which is a hybrid-scale self-similarity exploitation network. USRNet presents a learnable unfolding network, which is the first to solve multiple degradation problems using a single end-to-end model. We use FLOPs (floating point operations) to evaluate the computational complexity of the algorithm. The FLOPs of DPSR, IMDN, MAN, HSENet, USRNet, YSRNet, and YSRNet+ are 52.07 G, 7.67 G, 9.26 G, 18.82 G, 9.08 G, 7.13 G, 11.00 G, and 11.17 G, respectively. Due to the introduction of double priors, the algorithm's computational complexity has increased. Moreover, due to the introduction of the attention mechanism, YSRNet+ has only a little higher computational complexity than YSRNet, but it also has performance improvements.
Considering the article's length, we only use PSNR to evaluate the performance of algorithms. PSNR is an image evaluation index, which is defined by MSE between the ground-truth image and the generated SR result. It can be expressed as PSNR= 20log 10 Max (I HR ) MSE(I HR , G(I LR )) where Max(I HR ) denotes the maximum pixel value in original images. The larger the value of PSNR demonstrates, the better performance. Fig. 7. Illustration of the datasets. We select 10 types of images from UC Merced Land-Use dataset and WHU-RS19 dataset as training samples, respectively. And we randomly select five of each class as testing samples. We select 40 types of images from NWPU-RESISC45 dataset as training samples. And we randomly select two of each class for testing. In what follows, we will verify the validity and universality of the proposed algorithms in three remote sensing datasets.

A. Comparison With Different Methods on Dataset 1
In this section, we use the UC Merced Land-Use dataset to verify the validity of the proposed algorithms. As Fig. 7 shows, 400 Gy images are used for training, and 50 images are used for testing. The size of training images is 180 × 180, and the scales of SR are 2, 3, and 4. The noise level is set to σ = 0, σ = 3, and σ = 7. n in (1) is usually assumed to be additive white Gaussian noise. The noise intensity for n is σ/255. We first generate the noise matrix with the mean value of 0 and the standard deviation of σ/255. Then, we add the image with the noise matrix. Different standard deviations will produce different intensity noise matrices. Images with different noise levels are shown in Fig. 8. We set various noise levels to verify the algorithm's robustness under different conditions. We generate the training data pairs {x i , y i } by first extracting the component of randomly cropped image blocks (48×48 pixels for each block). And two data enhancement methods are used to increase training data: flipping and rotating. The number of unfolding layers is 6 (N = 6). Ten blurring kernels from [54] are used to verify the algorithm's performance under different degradation conditions. We use 4 isotropic Gaussian blurring kernels, 4 anisotropic Gaussian blurring kernels, and 2 motion blurring kernels. Although complex motion blurring kernels are generally not considered for aerial images, they are also considered in order to verify the effectiveness of YSRNet and YSRNet+ more fully.
The first eight kernels are Gaussian blurring kernels, and the last two are complex motion blurring kernels. The information entropy of different remote sensing images differs significantly compared with natural images. Simple remote sensing images, such as agricultural, contain less information entropy, while complex images, such as buildings, contain larger information entropy. Therefore, the stability requirements of the algorithm are more stringent. The best results are shown in bold, and the second is underlined. The last two are algorithms proposed in this article, which are also shown in bold.
The PSNR results are presented in Table I . YSRNet and YSR-Net+ still stand out under different blurring kernel degradation. We use the average PSNR of 50 testing pictures as the result, which can make results more stable. Traditional neural networks do not have good adaptability to different blurring kernels. They perform well with no blurring or with small blurring kernels. Fig. 9 shows the SR results of different algorithms with the scale factor being 3. DPSR and IMDN have similar SR performance. But because the lightweight multidistillation network being used in IMDN has fewer network parameters, which can lead to a fast training process. The edge structure of MAN algorithm is not clear enough. USRNet is also a deep unfolding method that can fully use the prior kernels' information. The images of USRNet, YSRNet, and YSRNet+ are clearer than the others; among them, YSRNet+ handles the details best. The PSNR of YSRNet+ is 0.08 dB, 0.18 dB, and 0.22 dB higher than USRNet on average for ×2, ×3, and ×4 times SR, respectively. Except for the first kernel and the last, YSRNet+ can obtain the highest PSNR, which demonstrates the effectiveness of the proposed network architecture.
Compared with USRNet, YSRNet has a larger number of parameters and more time-consuming training due to the processing module for TV prior term. In practice, because testing time is more critical, thus, an appropriate increase in training time is acceptable. Based on this, the attention module introduced by YSRNet+ also slightly increases the training time. The overall visual effect of USRNet is similar to that of the proposed algorithms, but YSRNet and YSRNet+ do better in detail. At the same time, YSRNet+ is more interpretable.

B. Comparison With Different Algorithms on Dataset 2
In this section, we use another remote sensing dataset to further verify the effectiveness of the algorithms. The training and testing samples are generated from WHU-RS19 dataset. The PSNR results are presented in Table II. YSRNet and YSRNet+ still perform well. In addition, several testing samples are selected for detailed analysis.
Since multiple degradation factors are considered in this article, several specific cases are selected for algorithm comparison. Degradation factors are set as Table III. We choose different scenes, scale factors, noise levels, and kernels to demonstrate the effectiveness of proposed algorithms. Fig. 10 is the simplest scene showing an industrial building taken by the airplane from WHU-RS19 dataset with the size of 384×384. A part of the image is enlarged to show the SR effect  of different algorithms more clearly. The scale factor is 3 with the noise level of 5. The image is blurred by kernel 5, which is an anisotropic Gaussian kernel. LR result is obtained through nearest neighbor interpolation. PSNR of LR image is only 15.49 dB. PSNR is improved to varying degrees by the optimization of different algorithms. The edges of DPSR, IMDN, and HSENet results are somewhat blurred, which roughly restores the shape of the target and provides better noise reduction. MAN lost some details of the image. USRNet restores the information from the original image better, but it does not perform as well as YSRNet and YSRNet+ for detail textures and small target edges. YSRNet has a clearer image and a cleaner noise reduction, but it is still blurry to restore the small objects in the image. YSRNet+ has a better SR performance and can further improve PSNR than YSRNet+.
In Fig. 11, more noise is added (σ = 9). DPSR and HSENet can roughly restore the HR image. Although IMDN restores the general outline of the object, the noise is not removed cleanly and stripes are produced. Thus, its PSNR is only 0.8 dB higher than LR image. Compared with YSRNet, USRNet does not restore road textures very well. But YSRNet also produces a small amount of interference stripes. YSRNet+ performs best among these SR algorithms. Fig. 12 is the most complex scene for 3 times SR. Except IMDN, the other methods can do noise reduction well. For freeway scenes, YSRNet and YSRNet+ have the same performance. The PSNR of them are 0.5 dB higher than USRNet.
Figs. 14 and 13 are images of parking lot and dense residentia, respectively. The task of Figs. 14 and 13 is more challenging with the scale factor of 4. For the textures in Fig. 13, YSRNet performs better. Some small cars can also be restored. In Fig. 14, a vehicledense area is enlarged. The visual effect of each algorithm is not clear enough due to the difficulty of the task. USRNet, YSRNet, and YSRNet+ behave similarly from visual effects. PSNR of In summary, all algorithms have the function of noise removing, deblurring and image SR. DPSR and HSENet have a similar performance. These two have problems of detail texture restoring. IMDN has faster computation speed, but its performance decreases with high noise levels. MAN may also lost some details in the process of image reconstruction. USRNet is stable and can handle various degradation factors. YSRNet+ performs best and has better interpretability at the same time.

C. Universality of the Proposed Algorithms on Dataset 3
PSNR results of the third dataset are shown in Table IV. Each PSNR is the average of 80 testing samples. The PSNR results demonstrate that our proposed algorithms outperform the other competing methods. YSRNet+ is 0.20 dB, 0.37 dB, and  0.47 dB higher than USRNet on average at 2x, 3x, and 4x SR, respectively. Traditional neural network methods show a better performance only at the first kernel. The proposed algorithms perform well, especially for complex blurring kernels, which indicates that the proposed algorithms can fully use the image's prior information and can be adapted to different blurring kernels. For the degradation of small blurring kernels, the traditional algorithm performs well. However, as the blurring kernels become more complex, the performance of traditional algorithms degrades dramatically. YSRNet and YSRNet+ perform well in all situations.
The visual effect of different algorithms is shown in Fig. 15. The last three methods are unfolding-based algorithms, which show that the unfolding-based methods have better SR results for complex blurring kernels. DPSR, IMDN, and HSENet cannot remove image blurring very well. MAN may lost some details. USRNet and the proposed methods are suitable for various blurring kernels. Compared to USRNet, the results of YSRNet are better than USRNet in detail due to the introduction of gradient channels. The attention module can further enhance YSRNet.
Since the proposed algorithm is based on the variable splitting algorithm, the algorithm follows the iterative updating process of the optimization algorithm. The number of iterative layers of the algorithm is set to six, so the image reconstruction process is divided into six steps. Each level of iteration brings the iteration results closer to HR pictures. The results of the experiment are shown in Fig. 16. The initialized picture results from the downsampling picture interpolated by nearest neighbors. The downsampling factor is 3. x k denotes iteration layers. As Fig. 16 shows, the image becomes clearer as the number of iteration times increases, which denotes that the network is similar to traditional optimization algorithms. The experimental results also show that PSNR increases with the number of layers being larger.
In summary, the proposed YSRNet and YSRNet+ algorithms are effective and perform well on standard datasets, especially for complex blurring kernels.

D. Ablation Experiment
The ablation experiments are shown in Table V, where ave. denotes the average value of the results in all SR scales and noise levels. Because the proposed algorithm is an improvement     based on USRNet algorithm, we perform the ablation experiments with USRNet. The improvements can be divided into two parts, the attention module, and YSRNet. And YSRNet+ is the combination of the attention module and YSRNet. TV priori denotes the model only using TV prior features for image reconstruction. As shown in Table V, the attention module and YSRNet can improve the algorithm performance to different degrees, verifying the proposed modules' effectiveness. Because TV priori is extracting the edge information of the target images, some image information may be lost. Therefore, the results of image reconstruction only using TV priori are limited. The parameter numbers of base model method, base model with attention method, YSRNet, and YSRNet+ are 592 K, 597 K, 1695 K, and 1707 K, respectively. Therefore, the parameter

V. CONCLUSION
This article introduces a double prior unfolding strategy for remote sensing image SR. Unlike the previous works, which are unexplainable networks, we propose a double prior unfolding network with interpretability for remote sensing image SR. D module can reconstruct the image and make the image clearer. R module can utilize deep priori and TV priori to remove noise. We combine the traditional optimization algorithm with the neural network and design a network (YSRNet), which has the flexibility of traditional algorithms and the strong fitting ability of neural networks. This unfolding strategy enables the algorithm to better handle multidegeneration SR tasks. In addition, an enhanced version (YSRNet+) is proposed, and an attention module is introduced into the algorithm. Unlike the commonly used attention module, this attention module enables the network to focus on more important features. YSRNet+ enhances the interpretability of the algorithm while improving the performance of the YSRNet. Multiple experiments are performed to evaluate the two proposed networks, from which the following conclusions can be obtained.
1) The proposed YSRNet and YSRNet+ algorithms perform better than state-of-the-art methods. The learning rate of the network should be carefully chosen to avoid the problem of slow or nonconvergence of the network.
2) As the number of iteration layers increases, the model parameters increase significantly, so we need to choose an appropriate number of iteration layers to make a trade-off between the performance and efficiency of the algorithm. 3) When the number of channels in the network is large enough, the performance gained from the attention module will be less noticeable. However, it still works. Based on the existing network problems, there are the following three aspects in our future work.
1) Using more prior information from traditional methods will be considered, which can make the traditional optimization algorithm better combined with the neural networks.
2) The network framework will be improved to enhance the fitting ability. 3) We will consider the design of lightweight networks to reduce network parameters and training time.