Enhanced Resolution of FY4 Remote Sensing Visible Spectrum Images Utilizing Super-Resolution and Transfer Learning Techniques

Remote sensing images acquired by the FY4 satellite are crucial for regional cloud monitoring and meteorological services. Inspired by the success of deep learning networks in image super-resolution, we applied image super-resolution to FY4 visible spectrum (VIS) images. However, training a robust network directly for FY4 VIS image super-resolution remains challenging due to the limited provision of high resolution FY4 sample data. Here, we propose a super-resolution and transfer learning model, FY4-SR-Net. It is composed of pretraining and fine-tuning models. The pretraining model was developed using a deep residual network and a large number of FY4 A 4 and 1 km resolution VIS images as the training data. The knowledge derived from 4 km to 1 km resolution images was incorporated into FY4 B 1 km to 0.25 km resolution VIS images. The FY4-SR-Net is fine-tuned by incorporating limited 1 km and 0.25 km resolution panchromatic images, and then producing 1km super-resolution VIS images of the FY4 satellite. Using the one-day FY4 test dataset for qualitative and quantitative evaluations, the FY4-SR-Net outperformed the classic bicubic interpolation approach with a 16.12% reduction in root-mean-square error and a 2.97% rise in peak signal-to-noise ratio averages. The structural similarity value average increased by 0.0026. This article provides a new precedent for improving the spatial resolution of FY4 series meteorological satellites, which has important scientific significance and application properties.


I. INTRODUCTION
weather forecast analysis, short-term climate prediction, environmental management, resource development, disaster prevention and mitigation, and scientific research. This satellite is an essential component of China's comprehensive meteorological observation satellite network [1]. The spatial resolution of a remote sensing image is an important quality measure and a leading indicator of a country's aeronautical capabilities. Remote sensing images with metric and submetric spatial resolution is already widely used, but their temporal resolution is still rather poor. Some remote sensing images have a low spatial resolution but high temporal resolution, such as the FY4 satellite VIS images with a temporal resolution of 5 min and a spatial resolution between 0.5 and 1 km. Such a mismatch between spatial and temporal resolution scarcely satisfies the demand for FY4 images with increasing production applications; thus, the advancement of meteorological remote sensing technology is greatly impeded [2]. Significant progress has been made in the area of remote sensing image process with various advanced deep networks [3]- [6]. Deep learning algorithms offer a cost-effective and efficient solution to this mismatching problem [7]. Super-resolution (SR) is a technique for enhancing image spatial resolution by reconstructing high-resolution (HR) images from single or multiple low-resolution (LR) images. It is classified into two categories: classical interpolation methods [8], [9], [10] and deep-learning-based methods [11], [12], [13], with the latter subdivided further into single-frame SR methods [14], [15] and multiframe SR methods [16], [17]. MFSR constructs HR images by acquiring multiple LR images of the identical scene with the same or distinct sensors [18]. Merino and Núñez [19] presented a technique called super-resolution variable-pixel linear reconstruction for reconstructing a HR image from many LR images recorded over an extended period of time. This technique was adapted from drizzle [20], which was meant to operate with dithered, under sampled astronomical images. Before image reconstruction, Shen et al. [21] developed an SR approach for MODIS remote sensing images where image registration was conducted in the range and spatial domains. Using panchromatic Landsat7 images captured on several days, Li et al. [22] proposed a MAP-based SR approach with a general hidden Markov tree model and tested it. Fan  and overlapping Gaofen-2 images. Because of the simplicity and excellent performance acquired through intense supervised training, the single-frame SR has become the workhorse.
Initial studies on single-frame SR for remote sensing images used simple structures, such as three-dimensional (3-D) full convolutional neural network and remote sensing image convolutional neural network with direct superposition of convolutional layers [24]- [26] as their basis. An unsupervised depth generating network was developed by Haut et al. [27] in order to overcome the lack of benchmark training data for super-resolution remote sensing images. With the advent of super-resolution networks in the realm of computer vision, modular designs, such as the multiperception attention network [28] and dense-sampling superresolution network [29] were implemented for single-frame SR. For better segmentation accuracy after super-resolution, Lei et al. [30] proposed the S2Net network, which can simultaneously accomplish SR and image segmentation for remote sensing images. Coupling remote sensing images and the generated HR images for a discriminative network, Lei et al. [31] developed a coupled discriminative generative adversarial network, which was able to perform well on an ultra-resolution task in the low-frequency region.
In regard to implementing deep learning-based superresolution networks, the quality and quantity of available highand LR samples, as well as training images, are all important considerations. FY4 remote sensing VIS images' inherent LR qualities make obtaining sufficient HR training data problematic. To address issue, we proposed a novel methodology with two steps. The first is to design a deep residual model for pretraining chunked four-fold super-resolution feature knowledge. The second measure is to fine-tune the system with higher-resolution remote sensing image pairs. Based on this approach, the resolution of FY4 visible spectrum (VIS) can be enhanced. Several image super-resolution experiments use image data instead of raw grayscale values as model inputs to build training networks, resulting in a loss of pixel information and a reduction in model generalizability [32]. In addition, to our knowledge, there have been few super-resolution studies of the FY4 global-scale meteorological satellite.
The FY4-SR-Net model, which we created in accordance with the pretraining and fine-tuning of image transfer learning theory, effectively addresses these issues. By using FY4 4 km-1 km monitoring VIS image data (gray values) as pretraining data, we can avoid the scarcity of HR remote sensing training datasets. Training with higher-resolution remote sensing image pairs is then used to fine-tune the system. Our proposed method for improving resolution is more accurate and works better than classical interpolation, both in terms of quality and clarity. A higher resolution FY4 satellite VIS image can visualize the finer structure and type of cloud masses.
The rest of this article is organized as follows: In Section I, some details about the data and our network structure. In Section II, we demonstrate the effectiveness of the proposed method using experimental results. Finally, in Section III, we conclude this article with some comments on future work.

A. Data Collection and Preprocessing
1) FY4 A: FY4 A belongs to the second generation of geostationary meteorological satellites in China. It is made to meet the country's wide range of environmental and space science needs [33], such as those in the oceans, agriculture, forestry, and hydropower. The FY4 A satellite was formally launched on November 11, 2016, and is equipped with an advanced geostationary radiation imager (AGRI), a geostationary interferometric infrared sounder (GIIRS), a lightning mapping imager, and a space environment package instrument. A scanning imaging radiometer is responsible for collecting cloud data. Fig. 1 shows the FY4-A satellite band and resolution information. With its fourteen channels, the satellite can detect aerosols and snow, as well as the different stages of clouds and high and mid-level water vapor. When compared to the single visible channel of FY2, FY4 A for the first time produces color satellite cloud maps and generates regional observation photos in one minute.
Users can access FY4 A data from the official website of the National Meteorological Satellite Center [34]. The FY4 star's main payload, the AGRI, has a complex double-scanning mirror mechanism that enables precise and flexible 2-D pointing, allowing for quick scanning of areas at a minute rate. It includes: full disk data with 4 km resolution; full disk data with 1 km resolution; China zone data with 4 km resolution; and China zone data with 1 km resolution. Full disk identified as DISK, China zone marker as REGC [35]. China zone REGC with a 5-min temporal resolution was chosen as the training data for this article.
2) FY4 B: FY4 B was launched on June 3, 2021, and is primarily used for operational meteorological satellite monitoring. The average life expectancy of a B star is seven years longer than that of A star. Its successful launch is crucial for ensuring the upgradation of China's geostationary meteorological satellites and their continued reliability and stability. The addition of the geostationary orbit high-speed imager to FY4 B provides the capability for high-speed, HR regional imaging. The GIIRS increases the spatial resolution in the VIS to 1 km. FY4 B has the capability of rapid imaging, i.e., 1-mine interval with a spatial resolution up to 250 m in a region. It includes: China zone data with 2 km resolution; China zone data with 500 m resolution; China zone data with 250 m resolution. The 2 km resolution data were not utilized in this article. Datasets with a resolution of one kilometer were created by downsampling data with a resolution of 500 m using an interpolation technique typical for image super-resolution. Fig. 2 depicts a tiny portion of the FY4 B imaging data. With the inclusion of multichannel AGRI the FY4 B spatial resolution in the 2.1 and 3.5 μm bands has been enhanced to 2 km. The FY4 B and A twin-star networks provide China and other countries along the "belt and road" with weather monitoring and forecasting, emergency disaster prevention and mitigation services.
3) Data Preprocessing: The FY4 data were projected and converted into a unified coordinate system in order to eliminate position bias. To minimize sensor errors, the relationship between digital quantization values and radiation brightness values was established after radiometric calibration [36]. The blue band (from 0.45 to 0.49 μm) of FY4 helps to obtain clear cloud boundary information when drawing cloud cover maps [37]. So we used the NOMChannel01 0.47μm VIS channel of FY4 A data, collected from September 13 to 20, 2021 for model pretraining. Deep learning has been shown to be effective in improving the resolution of LR multispectral images with high spatial resolution PAN images [38].Transfer learning and model testing used the NOMChannel01 PAN channel of FY4 B data from November 25 to 30,2021. Gray values of the FY4 data were normalized from 0 to 1 before being entered into the model in order to remove the influence of local perception characteristics. HR FY4 1000 m images are split into 128x128 data points while LR FY4 4000 m images are split into 32x32 data points during the preprocessing phase. The HR FY4 250 m images of the B star and the LR FY4 1000 m images performed the same processing to generate the fine-tuning dataset simultaneously. In addition to fragmentation, FY4 data with missing values were removed from the dataset. The final dataset consists of 316 199 groups, on which a random 80/20 training/testing split was performed. Python GDAL was used for preprocessing.

1) Structure of the Pretraining Network:
Due to a lack of HR monitoring data, direct training of a robust network for FY4-SR  remains a difficulty. A pretraining FY4 model is built using a deep residual model and a large number of FY4 4 km resolution and 1km resolution images as input train data, taking inspiration from transfer learning. As shown in Fig. 3, the FY4 model knowledge gained from 4 km resolution and 1 km resolution data is incorporated into 1km resolution and 0.25 km resolution data, and the network is fine-tuned by taking restricted 1km resolution and 0.25 km resolution data to build an FY4-SR-Net that meets the FY4 satellite's 1km super-resolution. The final input is the low resolution FY4 1 km F Y L and the high resolution FY4 250 m F Y H is obtained.
As illustrated in Fig. 4, the pretrained FY4 network consists of low-level feature extraction, element-wise summation, and upsampling layers. A collection of features are extracted by the first convolutional layer of the pretrained FY4 network. The model is trained with a residual network, where the input in each residual block is supposed to be x, the expected output is f (x), and the residual map g(x) = f (x) − x. The element-wise summation layer then accumulates the original inputs x and g(x) to produce the desired output (x). The residual network structure can provide more precision by implementing deeper network layers [39].
The subpixel convolution method proposed by [40] is used for upsampling, as depicted in Fig. 5. First, the features of r 2 channels are obtained by convolution, and then the HR image is obtained by cycle screening.
In comparison to the typical deconvolution layer, this subpixel convolution can decompose LR data into HR space without additional computation. Since ReLU has good nonlinear fitting performance [41], it is used as the model's activation function.
2) Fine-Tuning and Loss Function: Transfer learning is the process of adapting the knowledge or patterns acquired in one   area or task to a different, but similar domain or work. As shown in Fig. 6, the knowledge gained by solving the super-resolution of FY 4A 4 km resolution and 1km resolution data is used to construct a higher resolution FY4-SR-Net model. Some layers of the pretraining network were frozen, and then the finite HR FY4 B 1-0.25 km resolution data pairs were used to fine-tune the network.
In the field of single-image SR reconstruction, the pixel loss function and the perceptual loss function are the most commonly employed loss functions. This article uses the former function to minimize the pixel error between the output and input target for training the model. The FY4-SR-Net optimizes the network with the mean square error (MSE). In addition, we analyzed additional metrics, e.g., mean absolute error, SSIM and PSNR. The MSE, which is insensitive to outliers, safeguards the model's stability. This facilitates the reconstruction of FY4 data with a high resolution.
3) Evaluating Metrics: Quantitative evaluation and visual effects were used to assess the quality of super resolution image data. The RMSE is a widely used numerical accuracy metric for evaluating data accuracy. It does not indicate the number of individual point errors, but describes the overall dispersion of the FY4 data, where H i is the value of the original image, h i is the value of the super-resolution image, and n is the number of sampling points The PSNR measures the pixel difference between reconstructed HR images and actual images, and is the most popular metric for assessing image quality. MSE is calculated before PSNR, and it is frequently used to construct loss functions. Given two m× n monochromatic images I and K, one of which is the noise approximation of the other, MSE and PSNR are calculated as where ΔS represents the maximum color value of the image points. The ΔS value equals 255 if each sampling point is represented by 8 bits. However, the grayscale values of the FY4 data were frequently far above 255. The original ΔS formula is modified to reflect the difference between the maximum and minimum gray values in the FY4 data. The higher the PSNR value is, the better the image quality. The SSIM (4) is a measurement of the similarity of two images and has a wide range of applications in both image deblurring and image super-resolution.
where x and y are real images and super-resolution images, respectively, u x and u y represent the standard deviations of x and y, respectively. σ x and σ y are the standard deviations and σ xy is the covariance. SSIM is a number ranging from 0 to 1. The larger the value, the smaller the difference between the output image and the distortion-free image, and hence the better the image quality. When the two images are identical, SSIM equals 1.

III. RESULTS
The network training and experiments were implemented with PyTorch [42]. The model consists of 89 layers of convolutional networks, with the convolutional kernel size in each layer set to 3 and the padding set to 1. The input data were passed through the convolutional layers, and the output is resized with the zero-padding method to achieve a constant output matrix size for the subsequent layer. We utilized the Adam optimizer with a learning rate of 0.0001 to train the model with the large dataset. The exponential decay formula is employed to adjust the learning rate. The size of the training batch is set at 128. The models under went 200 epochs of training, with each epoch consisting of 2400 training steps. In order to prevent overfitting, the early stopping approach is used to terminate training when the model's performance begins to drop in the validation set (petience is 7). For model training and evaluation, a workstation with four 2080ti 11G GPUs was employed.

A. Pretraining and Fine-Tuning Performance
For our proposed FY4-SR-Net model, the pretraining network was trained with 1318 remote sensing image samples from September 13 to 20, 2021 of FY4 A. In other words, pretraining to acquire chunked four-fold super-resolution feature information. FY4 B data from November 25 to 29, 2021 were utilized to fine-tune the pretrained network. The remaining FY4 B monitoring data with an interval of 2 h on Novemeber 30, 2021, were selected for model testing, as shown in Fig. 7. FY4 B is a geostationary satellite that orbits the equator, and UTC 12:00-20:00 corresponds to the nighttime of Eastern Hemisphere. It is therefore impossible for active remote sensing satellites to gather reflectance spectra, rendering the image maps nearly dark.
In addition, we trained two deep residual network based pretraining networks with FY4 A and FY4 B data, respectively, refered to as FY4 A pretraining and FY4 B pretraining. After training these networks under identical settings, the testing dataset D test was used to compare the performance of FY4 SR with a fourfold upscaling factor. We compared the performance of these training strategies with the bicubic interpolation method as a baseline [43]. Note that interpolation was performed on LR images of each sample to create HR images. We quantitatively evaluate the three training strategy methods by evaluating metrics, including the RMSE, PSNR, and SSIM.
As illustrated in Tables I-III, FY4-SR-Net outperformed bicubic and the other two training strategies for most of the test samples in terms of the RMSE, PSNR and SSIM. The FY4-SR-Net RMSE was lowered by an average of 16.12%, while the FY4 A pretraining and FY4 B pretraining were reduced by average of 1.47% and 14.26%, respectively. The PSNR averages increased by 2.97%, and the FY4 A pretraining and FY4 B pretraining increased by an average of 1.06% and 2.61%, respectively. The SSIM value average increased by 0.0026, and the FY4 A pretraining and FY4 B pretraining average increased by the same average of 0.0023. Fig. 8 shows the percentage of the relative difference line chart of the RMSE between the SR results and bicubic method for one day of testing. The level of promotion varies with time, as test data change substantially between data acquisition moments, * marked in the Tables I-III. For remote sensing images acquired from 12:00 to 20:00 UTC during the near-dark period, the FY4 A pretraining model has difficulty enhancing enhance the resolution well, even worse than bicubic. In contrast, the FY4-SR-Net  model with fine-tuning achieved better super-resolution results than the FY4 B pretraining strategy.
The results of comparing the PSNR with the classic bicubic approach on November 30, 2021 are depicted in Fig. 9. The PSNR performance of the three training strategies is comparable to the RMSE. The large performance discrepancy between FY4-SR-Net and FY4 B pretraining implies that transfer learning can significantly enhance the super-resolution impact. The SSIM metric might be used to quantify degradation in image quality due to processing, such as data interpolation or data compression. As illustrated in Fig. 10, the FY4-SR-Net model exhibits the greatest super-resolution effect, followed by FY4 B pretraining and FY4 A pretraining. These results demonstrate that transfer learning is effective, particularly when the number of high resolution training samples is limited.

B. Quality of SR Result for FY4
We compare the qualitative results of our method to those of the original LR images and bicubic interpolation. In general, bicubic may increase spatial resolution effectively, but the resulting image has a polished texture and limited recovery of cloud information. Fig. 11 depicts the SR outcomes for FY4 B 1 km resolution PAN images at a scale of 4. Several cloud-based details have been properly reconstructed in comparison to the initial FY4 B LR data. As demonstrated in Fig. 12, the benefit of our proposed approach is also apparent in shaping the cloud textures of FY4 A 1 km resolution VIS images. Visually and quantitatively, the FY4-SR-Net model surpasses the conventional bicubic technique. The FY4-SR-Net method provides improved performance in detail restoration. With this technology, HR FY4 satellite VIS images can be  In short, all three training methods, with the exception of the dark time period, produce superior super-resolution outcomes than bicubic interpolation. The addition of fine-tuning to our model can effectively compensate for and enhance the pretraining outcomes. The pretraining and fine-tuning based FY4-SR-Net model achieved greater super-resolution outcomes than the direct training method.

IV. CONCLUSION
To meet the super-resolution requirements of the FY4 satellites, a transfer learning method based on pretraining and finetuning is been presented in this article. A deep residual network is pretrained with a large number of FY4 4 km resolution and 1 km resolution VIS images and then fine-tuned with the limited 1 km resolution and 0.25 km resolution data. Experiments show that our proposed FY4-SR-Net is better than the current baseline bicubic methods, with an improvement of 16.12% in RMSE, an increase of 2.97% in average PSNR, and an increase of 0.0026 in average SSIM values for monitoring data with a 2-hour interval on November 30, 2021.
With its advantages of high super-resolution accuracy and low cost, FY4-SR-Net has the potential to be widely used for FY4 series satellite VIS image SR in various locations, particularly in mainland China. As a result of this super-resolution technique, the FY4 series of satellites have improved the spatial and temporal resolution of images. This enhances their ability to monitor regional weather events and provide meteorological services.
In conclusion, the development of FY4-SR-Net based on transfer learning can be more precise and effective than interpolation in terms of both qualitative and quantitative resolution enhancement. This is the first time, to our knowledge, that a pretraining and fine-tuning structured SR method has been proposed for the FY4 remote sensing VIS image data. We anticipate that our research will establish a precedent for super-resolution image processing based on transfer learning. As a result, numerous meteorological satellites will be able to deliver more precise and clearer HR images. In the future, we will perform comparisons with other single-frame SR methods. Our research will focus on enhancing satellite image SR performance at certain time intervals and examining data input and architecture design to further reduce the model prediction errors.