Super-Resolution Reconstruction of Remote Sensing Images Using Generative Adversarial Network With Shallow Information Enhancement

The super-resolution (SR) reconstruction method based on deep learning can significantly improve the spatial SR of remote sensing images. However, the current methods make insufficient use of the remote context information and channel information in shallow feature extraction, resulting in the limited effect of SR reconstruction. This article proposed a new SR reconstruction model, SIEGAN, which uses generative adversarial network with shallow information enhancement to improve the effect of SR reconstruction of remote sensing images. Similar to other generative adversarial models, SIEGAN is composed of generator and discriminator. But SIEGAN enhances the generator's ability to extract shallow information by using three different scale convolution operations. Specifically, a depthwise convolution is used to extract the local context information of each band of the image. A depthwise dilation convolution is used to capture the remote context information in the image. Finally, a 1×1 convolution is used to extract the correlation features between different channels in remote sensing images. In addition, SIEGAN uses U-Net network as its discriminator to provide detailed feedback per pixel to the generator to improve the model's ability to identify image details. And the spectral–spatial total variation loss function is introduced to ensure the spectral–spatial reliability of the reconstructed images. The experimental results on Gaofen-1 data proved that compared with the state-of-the-art models, SIEGAN has achieved better SR reconstruction performance. Furthermore, the reconstructed images by SIEGAN demonstrate better performance in land cover classification.


I. INTRODUCTION
S ATELLITE remote sensing images provide important and effective information for earth surface monitoring, and are widely used in target detection [1], land cover classification [2], Manuscript  disaster early warning [3], urban economic level assessment [4], ocean exploration [5], single modality [6], and multimodal classification [7], [8], [9]. However, due to the limitations of environment, imaging equipment, and other factors, the spatial resolution of the acquired remote sensing images is limited, which affects the application effect of remote sensing image in practical tasks. Moreover, hyperspectral images collected from airborne or satellite sources inevitably suffer from spectral variability issues [10], resulting in low-quality images that cannot meet actual needs. Image super-resolution (SR) technology can improve the resolution of the original images without changing the hardware equipment of the imaging system. By generating clearer high-resolution reconstructed images, it can meet the needs of various applications for high-resolution remote sensing images. Image SR reconstruction technology refers to the process of improving the spatial resolution of images, which was first proposed by Harris et al. [11] in the 1960s. Through the process of SR reconstruction, the pixel density and detail information in the images can be significantly improved. In recent years, image SR reconstruction technology combined with deep learning methods has become the mainstream. Dong et al. [12] proposed an SR reconstruction convolutional model of SRCNN. Compared with the traditional image SR reconstruction models, SRCNN has higher peak signal-to-noise ratio (PSNR). Subsequently, more models based on deep learning have been proposed, such as FSRCNN [13], VDSR [14], EDSR [15], and FormResNet [16]. However, these models use the mean square error (MSE) as the loss function, which makes the SR reconstructed images lose a lot of high-frequency details. Ledig et al. [17] proposed SRGAN model based on the generative adversarial network (GAN), which uses the pretrained VGG network to calculate the perceptual loss, so as to generate more realistic SR reconstructed images. Based on SRGAN, Wang et al. [18] proposed ESRGAN model, which uses several residual-in-residual dense blocks (RRDBs) as its basic structure. In addition, Liang et al. proposed SwinIR model [19] to reconstruct SR images based on transformer mechanism.
Recently, the models based on deep learning have also been extended to the field of SR reconstruction of satellite remote sensing images. However, different from natural images, satellite remote sensing images contain richer texture information, such as mountains, buildings, and rivers, which makes the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ mapping relationship between high and low resolution extracted from natural images unable to be directly used for SR reconstruction of remote sensing images. Liebe et al. [20] retrained the SRCNN model with Sentinel2 data to learn the mapping relationship between high and low resolution of remote sensing images, and proposed msiSRCNN model to reconstruct the remote sensing images. Haut et al. [21] proposed the RSRCAN model for reconstructing the SR remote sensing image, which combines the attention mechanism with the residual network. Dong et al. [22] designed the dense sampling super-resolution model to broaden the feature channel in the SR reconstruction of remote sensing images to extract more feature information. Xiong et al. [23] improved the SRGAN model to improve the model's generalization ability for SR reconstruction of remote sensing images. Dong et al. [24] proposed a coupling network combining CNN and hyperspectral unmixing to improve the spatial resolution of hyperspectral images in an unsupervised manner. Jiang et al. [25] proposed EEGAN model, which uses the Laplace operator to enhance the edge information of the image. Luis et al. [26] further improved the ESRGAN model to increase the resolution of Sentinel2 images to 2 m. In the SR model based on deep learning, the shallow feature extraction is mainly used to recover the structural information of the images, while the deep feature extraction is mainly used to recover the detail information of the images, which complement each other. However, the current SR reconstruction models only use single-scale convolution operation to extract the shallow feature of the images, resulting in insufficient extraction of the shallow information of images, which will affect the effect of image SR reconstruction of remote sensing images.
To address the above problems, a new SR reconstruction model, named SIEGAN, is proposed to improve the effect of SR reconstruction of remote sensing images. SIEGAN adopts the GAN as the backbone structure and improves its ability to extract shallow information from remote sensing images by introducing the idea of large kernel convolution. By decomposing the large kernel convolution into depthwise convolution, depthwise dilation convolution and 1 × 1 convolution, SIEGAN can extract the shallow feature of remote sensing image at three different scales, which will provide enough initial information for the subsequent SR reconstruction process.
The contributions of the current article are as follows: 1) A model of SIEGAN is proposed, which can significantly improve the quality of SR reconstruction of remote sensing images. 2) A strategy of extracting shallow feature by using three different scale convolution operations is proposed to effectively enhance the shallow feature representation in the process of image SR reconstruction. 3) U-Net is used as the discriminator of SIEGAN, which enables SIEGAN to identify the authenticity of each pixel, thus improving the model's ability to recognize image details. 4) The spectral-spatial total variation loss function is introduced to improve the spectral-spatial reliability of the SR reconstructed remote sensing images.
The rest of this article is organized as follows. Section II discusses the framework of SIEGAN, and its key modules. Section III gives the experimental results and Section IV concludes this article.

II. METHODOLOGY
In this article, a model of SIEGAN is proposed to solve the problem of insufficient extraction of shallow feature in the process of SR reconstruction of remote sensing image. Similar with other models based on GAN, SIEGAN model is also composed of two components of generator and discriminator. The generator is used to extract shallow and deep features from the original low-resolution remote sensing images and generate high-resolution remote sensing images based on the fusion of these features. The discriminator is used to distinguish the images generated by the generator from the original high-resolution remote sensing images. The generator network and discriminator network are trained and updated iteratively to ensure that the model converges to convincing results. In this process, the full extraction of image features and the effective selection of discriminators will affect the effect of image SR reconstruction. Fig. 1 shows the structure of the generator network in SIE-GAN. The generator network is composed of shallow feature extraction module, deep feature extraction module, and highresolution image reconstruction module. In the shallow feature extraction module, three different convolution operations are used to extract image shallow features from multiple perspectives. RRDBs are used in the deep feature extraction module to extract the deep features of images. Through the long-distance skip connection, the shallow and deep features are aggregated and input into the high-resolution image reconstruction module to generate high-resolution remote sensing image.

A. Generator of SIEGAN Model
1) Shallow Feature Extraction: When extracting the shallow features of remote sensing image, most of the existing models focus on extracting the local feature of the image, while ignoring the long-distance correlation feature and channel feature of the image. The limitation of shallow feature extraction will affect the quality of image SR reconstruction. Inspired by MobileNet [27] and VAN [28], this article introduces the idea of large kernel convolution to extract shallow features from remote sensing image. The large kernel convolution is decomposed into three convolutions with different scales. Each convolution can extract shallow feature from a specific scale. Specifically, given a K × K large kernel convolution, it can be decomposed into a (2d−1) × (2d−1) depthwise convolution, a K/d × K/d depthwise dilation convolution, and a 1 × 1 convolution, where d denotes the dilation rate. The depthwise convolution is used to extract local contextual information in each band of the remote sensing images. The depthwise dilation convolution has a large receptive field and focuses on capture the long-distance relationship in the image. The 1 × 1 convolution is used to aggregate the information between each band of the remote  sensing images. These convolutions with different scales help the generator to extract shallow feature from different receptive fields, thus providing more abundant shallow feature for the SR reconstruction process. Fig. 2 shows the structure of the shallow feature extraction module, where the gray grid represents the position of the centroid and the blue grid represents the position of the convolution kernel.
Given a low-resolution image I LR ∈ R H×W ×C LR (H, W , and C LR represnet the image height, width, and channel, respectively), the process of extracting shallow feature F SF ∈ R H×W ×C LR is shown as where C is the feature channel number, Conv 1×1 represents channel convolution, Conv DW represents depthwise convolution, and Conv DWD represents depthwise dilation convolution.

2) Deep Feature Extraction:
In the deep feature extraction module, the deep feature of remote sensing images is extracted based on the obtained shallow feature. The core part of the deep feature extraction module is several RRDBs, which consists of multilevel residual networks and dense connections. Fig. 1(a) shows the structure of RRDB. RRDB uses dense blocks as its main component and multiplies the residual information by a constant β to enhance the stability of the network [29]. Fig. 1(b) shows the structure of dense block. The dense block is composed of several convolution layers and the LReLU function is used as the activation function. For each layer in the dense block, the feature maps of all preceding layers are received as its input, and its feature map will also be used as input of all the subsequent layers [30]. This dense connection can alleviate the problem of gradient vanishing, enhance the reuse of features, and reduce the number of parameters, so as to extract the deep feature of the image more effectively.
Based on the extracted shallow feature F SF , the deep feature F DF ∈ R H×W ×C is obtained by the deep feature extraction module, as follows: where H DF denotes the deep feature extraction module, and it contains a 3 × 3 convolutional layer and K RRDBs. Intermediate feature F i m and the output deep feature F DF are extracted as where F 0 = F SF denotes the shallow features, H RRDB i denotes the i th RRDB, and H Conv is a 3 × 3 convolutional layer.

3) High-Resolution Image Reconstruction:
The fusion of shallow and deep features is helpful to improve the effect of SR reconstruction of remote sensing image. In the high-resolution image reconstruction module, a long-distance skip connection operation is used to aggregate the acquired shallow and deep features at the pixel level. The aggregated features are convoluted twice and mapped into HR space to obtain high-resolution reconstructed remote sensing images. The process is shown as follows: where H HRG denotes the high-resolution image reconstruction module, which consists of two 3 × 3 convolutions and an activation function. I SR denotes the generated SR image and + denotes the skip connection.

B. Discriminator of SIEGAN Model
Remote sensing images contain rich texture and detail information. Making full use of these texture and detail information to distinguish the authenticity of the reconstructed high-resolution remote sensing image will help to improve the quality of the reconstructed images. This article uses U-Net [31] network as the discriminator of SIEGAN model. U-net can output the realness value of each pixel in the image, which will improve the ability of SIEGAN to distinguish image details. Fig. 3 shows the structure of the discriminator in SIEGAN model. The U-Net has a contracting path on the left and an expansive path on the right. The contracting path consists of several identical 3 × 3 convolutions, each followed by a spectral normalization (spectral norm) and a downsampling operation. The spectral norm can help alleviate the annoying artifacts caused by GAN training. The features obtained from the contracting path are fed into the expansive path. Each convolutional layer in the expansive path is also followed by a spectral norm, and every step consists of an upsampling of the feature map, and the resulting features are connected to the feature map in the contracting path and fed into a 3 × 3 convolutional layer at the end of the network. Finally, a 1 × 1 convolution maps the feature vector to the feature space we need for determining the authenticity of each pixel in the image. At the same time, in order to avoid the problem that convolution operation is easy to lose image boundary information, we add paddings in the convolution layer to keep the scale of feature mapping unchanged. This will further ensure the discriminator's ability to distinguish image details.

C. Loss Functions
In this article, L 1 loss, perceptual loss, adversarial loss, and spectral-spatial total variation loss are used to train the model. 1) L 1 Loss: L 1 loss, also known as mean absolute error loss, is calculated as the mean of the sum of the absolute differences between the generated output G(x i ) and the target y i for each ith image. L 1 loss is calculated as follows: 2) Perceptual Loss: The commonly used perceptual loss function is the MSE, which can achieve PSNR. However, MSE usually ignores high-frequency information, which will cause the texture of the generated image to be too smooth. To solve this problem, Johnson et al. [32] introduced a loss function L percep that is closer to perceptual similarity, which uses a pretrained VGG-19 [33] network as a feature extractor to compute the average error between two feature maps.
In this article, the pretrained VGG model is fine-tuned to calculate the perceptual loss in the four bands of remotely sensed images, which is calculated as where ϕ i denotes the pretrained VGG network, and C j , H j , and W j denote the size of the jth layer feature map.

3) Spectral-Spatial Total Variation Loss:
The spectralspatial total variation loss function [34] can consider both the spectral and spatial characteristics of satellite remote sensing images to ensure the spatial and spectral confidence of the reconstructed images, which is calculated as

4) Adversarial Loss:
The adversarial loss of the generator is as follows, where x r denotes the true HR image and x f denotes the HR image generated by the generator: The adversarial loss of the discriminator is calculated as follows: The total loss of the generator is calculated as follows, where λ 1 , λ 2 , λ 3 , and λ 4 denote the weighting coefficients, respectively:

A. Datasets
The remote sensing images used in our experiment are extracted from the Gaofen-1 satellite (GF1). GF1 is the first satellite of China's high-resolution earth observation system, which was successfully launched at 12:13:04 on April 26, 2013, by the Long March-2 launch vehicle. The GF-1 satellite is equipped with two panchromatic multispectral cameras (PMS) and four wide-field-view multispectral cameras (WFV). The images captured by PMS have higher resolution, while the images captured by WFV have relatively lower resolution. In this article, images generated by GF1WFV are used as the low-resolution images, and images generated by GF1PMS camera are used as high-resolution images. Table I shows the parameters of the GF1 satellite. Table II gives the technical specifications of the satellite payload, in which the multispectral images contain four bands: red, green, blue, and near-infrared. Among them, the red wavelength ranges from 0.63 to 0.69 μm, which have an absorption effect on chlorophyll and can weigh the health status of green plants. The green wavelength range is 0.52-0.59 μm, which is mainly used to distinguish man-made feature types and assess crop production trends. The wavelength range of the blue light band is 0.45-0.52 μm, which can reflect the information about vegetation and soil. The wavelength range of the near-infrared band is 0.77-0.89 μm, which can be used to distinguish nonvegetation from vegetation.
To create a training sample set containing pairs of highresolution and low-resolution remote sensing images, two sensors are required to image common ground targets at the same position and the same spectral band. The GF1 satellite images in Shandong Province are selected as the image sample of our experiment. The red, green, blue, and near-infrared data in PMS images and WFV images are extracted, respectively. Shandong Province is rich in surface and ecological types, including different forms of woodland, shrub, grassland, farmland, buildings, desert, ice and snow, water, wetland, and other geomorphic types. The selected remote sensing images are preprocessed, including radiometric correction, atmospheric correction, and orthophoto correction. In addition, WFV image is resampled to 8-m resolution to achieve registration with PMS image. At the same time, all images are cropped to 128 × 128 pixels and randomly assigned to the training set and the test set. Fig. 4 shows the true color display of some preprocessed remote sensing images. These photos were taken in May 2019. It can be seen that in each pair of remote sensing images, the HR image is clearer than the LR image and contains more detailed information.

B. Metrics
The evaluation indexes of PSNR, structural similarity index (SSIM), spectral angle mapper (SAM), and local standard deviation (LSD) are used to evaluate the performance of SR reconstruction of remote sensing images.
1) PSNR: PSNR is the most common evaluation metric for image SR reconstruction. This index evaluates the image quality based on the error between the corresponding pixels. The larger the PSNR value, the smaller the distortion of the reconstructed image and the better the reconstruction image.
PSNR (dB) is calculated as where MaxVal denotes the maximum pixel value of the image and a×b is the image size.
2) SSIM: SSIM measures the similarity between images in terms of brightness, structure, and contrast, which can better reflect the subjective perception of human eyes. The calculation of SSIM is where the constants C 1 = (K 1 L) 2 and C 2 = (K 2 L) 2 depends on the dynamic range of the pixel values L. The default is K 1 = 0.01 and K 2 = 0.03. The value range of SSIM is [0, 1]. The closer it is to 1, the better the image quality.
3) SAM: SAM regards the spectrum of each image element as a high-dimensional vector and measures the similarity between the spectra by calculating the angle between the two vectors. The smaller the angle, the more similar the two spectra are. SAM is calculated as

4) LSD:
LSD reflects the degree of gray variation in local areas of an image. In remote sensing images, the same ground objects have similar spectral characteristics and have similar gray values in the same band. The boundary is used to segment different ground objects. In the boundary of the image, there is usually a large LSD due to the large gray variation between different objects. Therefore, the index LSD can also be used to measure the restoration effect of the SR reconstructed image on the details of the original image.
The LSD value of pixel (i, j) is calculated as where f (i, j) is the gray value of pixel (i, j); and LM represents the average value of the gray value in the K×K window (K = 3 in this rticle).

C. Parameter Setting
The WFV data were resampled to 8-m resolution and coaligned with the PMS images. The images are cropped to a size of 128 × 128 pixels, and the overlapping pixel size is 10 to avoid too obvious splicing traces. A total of 7800 images are obtained through the above operations, which are randomly divided into the training samples and the test samples. Finally, a training set containing 5000 remote sensing images and a test set containing 2800 remote sensing images are generated.
Adam optimizer is used in the training process of SIEGAN, and the learning rate is initialized to 2×10 −4 , and the input mini-patch is 8. SIEGAN is trained by alternatively updating the discriminator and generator. Specifically, our model is trained in two steps. In the first step, we train the generator by minimizing the L 1 loss for 100k iterations, which allows the subsequent adversarial training to have a better initialization. Next, train our model based on (10) and (11), where λ 1 = 1, λ 2 = 1e−3, λ 3 = 1e−1, λ 4 = 5e−3, learning rate 1×10 −4 , and 200K iterations.

D. Experimental Results and Analysis 1) Evaluation on Super-Resolution:
Several deep-learningbased SR reconstruction models are selected as the benchmark comparison models. These models include the convolutionbased model of SRCNN, the residual-network-based models of EDSR, RCAN [35], and VDSR; the GAN-based models of ESRGAN and Real_ESRGAN [36], and the transformerbased model of SwinIR. For fair comparison, these models are retrained on the remote sensing image dataset constructed in this article. Table III and Fig. 5 show the performance of SR reconstruction of different models under different metrics. In Table III, the best results are indicated in bold.
It can be seen from Table III and Fig. 5 that SIEGAN performs well under all evaluation indicators. In particular, SIEGAN achieved the best performance under SSIM and SAM indicators. Under the PSNR index, the performance of SIEGAN is only slightly worse than that of the RCAN model. The good performance of SIEGAN in three indexes shows that it is feasible and effective to use SIEGAN to SR reconstruct of remote sensing images. Besides the SIEGAN model, SRCNN and Real_ESRGAN   also achieved good performance in three indicators. However, the performance of EDSR and VDSR is relatively poor under the three indicators. Fig. 6 shows the visual effects of remote sensing image SR reconstruction under different models. Group (1) in Fig. 6 shows the remote sensing images containing more flat areas, and there is relatively little detailed information in the images. Group (2) shows the remote sensing images relatively more detailed information in the images. From the visual effect of SR reconstruction of two groups of remote sensing images, the SIEGAN model proposed in this article has achieved a good visual effect of image SR reconstruction, and the image reconstructed by SIEGAN model is more real. In the images reconstructed by SIEGAN, the ground object information is restored more clearly, the contour is defined more clearly, and there are fewer artifacts. Therefore, combined with the quantitative results given in Table III and the qualitative visual effects given in Fig. 6, SIEGAN model proposed in this article achieves the best SR reconstruction performance compared with other SR reconstruction models.
In addition, models of RCAN, and SRCNN also perform well under the metrics, but the visual effect of reconstructed images under the two models is not as good as that of SIEGAN. The main reason is that the SR images reconstructed by the two models cannot produce rich details, and some even produce artificial artifacts. It is worth mentioning that the SR reconstructed images obtained by SwinIR model cannot achieve a good quantitative performance value, but it has achieved a good visual effect. This shows that the quantitative performance index of SR  reconstruction is not enough to fully characterize the effect of image SR reconstruction. And in the task of image SR reconstruction, it is still a difficult problem to find a better quantitative index that conforms to the effect of human visual recognition. Fig. 7 shows the effect of image SR reconstruction measured by LSD index. The boundary between different ground objects in the image can be displayed according to the change of the LSD value. It can be seen that the image reconstructed by model SIEGAN generate more complete and clear boundaries, and these boundaries are closer to the boundaries in the original HR image. Other models are inferior to SIEGAN model in reconstructing the remote sensing images. The images reconstructed by other models have different degrees of boundary loss, especially SRCNN, EDSR, RCAN, VDSR, and ESRGAN. Although Real_ESRGAN and SwinIR can generate relatively clear boundaries, the restoration effect of the boundary information is worse than SIEGAN.
Each spectral band of satellite remote sensing image is very important. The red band can measure the health status of green plants. The green band can distinguish man-made features and evaluate crop growth trends. The blue band is used to reflect soil and vegetation information. The main function of the nearinfrared band is to distinguish vegetation from nonvegetation. SIEGAN model considers four spectral features of the WFV images when reconstructing SR images. We decompose the reconstructed remote sensing images into four bands: red, green, blue, and near-infrared, and compare the reconstruction effect of the SIEGAN model on the four bands with that in the original low-resolution remote sensing image. Fig. 8 shows the effect of SR reconstruction of SIEGAN on each band. It can be seen that the image reconstruction by SIEGAN has improved the clarity of all bands and the contours of the ground features have become clear. This shows that the SIEGAN model cannot only improve the spatial resolution of  remote sensing images, but also maintain the spectral information of the image.
2) Evaluation on Super-Resolution With Land Cover Classification: Land cover classification is an important research work in the field of remote sensing applications, especially the monitoring of land cover dynamics in large-scale areas is the basis of scientific research on global change and sustainable development. In this article, the remote sensing images before and after SR are applied to the task of land cover classification to further explore the reconstruction performance of the SR model. The remote sensing images used for land cover classification in this article are selected from Shandong Province. The land cover classification labels refer to the labels given in the 10-m resolution land cover data released by ESA in 2020. The data includes 11 types of labels: woodland, shrub, grassland, cultivated land, architecture, desert, snow (ice), waterbody, wetland, mangrove, and moss. The remote sensing images selected in this experiment contain eight kinds of ground objects. Segnet network [37] is used to realize the task of land cover classification. Classification overall accuracy and Kappa coefficient are used to evaluate the effect of classification. The Kappa coefficient considers both missing and wrong pixels. The higher the Kappa coefficient, the better the classification effect of the model. Table IV shows the land cover classification results of remote sensing images. It can be seen that the classification results of the images reconstructed by the SR model are improved compared with the original low-resolution images. Among them, SIEGAN improves the overall accuracy of the image before reconstruction by 10%, the Kappa coefficient increases by 0.08, and the classification results of the images generated by SIE-GAN are better than other methods. It can also be seen from Fig. 9 that the classification effect of the SIEGAN reconstructed image is better than the original low-resolution image and other advanced models in the recognition of image boundaries and scattered objects. Combining qualitative and quantitative results, the SIEGAN method proposed in this article performs the best and can significantly improve the performance of land cover classification, which also verifies the effectiveness of the model in improving the spatial resolution of remote sensing images.

E. Ablation Study
In order to verify the necessity of extracting shallow feature information by convolution operation at different scales in SIE-GAN model, ablation experiments were conducted in this article. On the basis of the SIEGAN model, the convolution operation of a specific scale is removed, and the SR reconstruction of remote sensing images is completed on the basis of the simplified model. Table V shows the results of ablation experiments, and Fig. 10 shows the results of visualization of ablation experiments. It can be seen that removing convolution operation at any specific scale will affect the effect of SR reconstruction of the model. This shows that the shallow information extracted by the three convolution operations plays an important role in improving the SR reconstruction effect of the image. That is, the idea of extracting the shallow information of an image from three different receptive fields proposed in this article is reasonable. From the visual effect shown in Fig. 10, the complete SIEGAN model generates a clearer SR reconstructed image than the simplified model with a specific convolution operation removed. This further verifies the necessity of performing three convolution operations in the SIEGAN model.

IV. CONCLUSION
Aiming at the problem that the existing SR reconstruction models have limited ability to extract the shallow feature of remote sensing images, an SR reconstruction model based on GAN is proposed, named SIEGAN. SIEGAN designed three convolution operations with different scales in the generator to extract shallow information. By capturing more abundant shallow feature, it can enhance the shallow feature in the process of SR reconstruction. SIEGAN uses U-Net as its discriminator, which can output the pixel realness of the image to improve the reconstruction ability of the model to image details. At the same time, SIEGAN introduces the spectral-spatial total variation loss function to ensure the spectral spatial reliability of remote sensing images. The experimental results show that compared with the baseline model, the SR reconstruction images based on SIEGAN achieve better overall visual effect and the contour details of ground objects in the image are clearer. The images before and after SR reconstruction are applied to land cover classification. The results show that the images reconstructed by SIEGAN have better classification accuracy, which further verifies the effect of SIEGAN for SR reconstruction of remote sensing images.
In future article, we will further explore the semi-supervised or unsupervised SR reconstruction models to broaden the research scope of image SR reconstruction. Furthermore, this article only discusses the application of SR reconstruction remote sensing images in land classification. It is also an interesting and important work to explore a suitable SR reconstruction model in a wider range of practical applications, such as target detection and instance segmentation.
Yujia Fu received the B.S. degree in information management and information system from the Northeast Forestry University, Harbin, China, in 2020. She is currently working toward the Ph.D. degree in forestry engineering with the College of Information and Computer Engineering, Northeast Forestry University, Harbin, China, under the supervision of Prof. M. Wang.
Her research interests include remote sensing image processing and deep learning. She is currently a Professor with the College of Information and Computer Engineering, Northeast Forestry University, Harbin, China. She has authored/coauthored a number of publications in international journals and conferences. Her research interests include image processing, deep learning, and data mining. Prof. Wang is a reviewer of several international journals.