Single Image Super-Resolution: From Discrete to Continuous Scale Without Retraining

Convolutional neural network (CNN)-based single image super-resolution (SR) methods have achieved superior performance on some discrete-scaling factors, including 2, 3, and 4. However, the scaling factors for SR should be continuous and not discrete in practical applications. Previous CNN-based SR models usually yield poor results for non-integer-scaling factors and are sometimes even worse than results derived from the conventional bicubic method. To extend CNN-based SR models to continuous scale, this paper proposes a multiple-scaling-based SR (MSSR) method that combines an integer-scaling-factor SR and once or twice non-integer-scaling-factor SR without retraining networks. For a non-integer-scaling factor, the MSSR method first computes an optimal integer-scaling factor according to the data similarity and choose the corresponding pre-trained model for the next stage. Then, an existing CNN-based model is used to perform the integer-scaling-factor SR. Finally, the output is scaled to the target size. The proposed MSSR method can extend a variety of existing CNN-based SR models from discrete to continuous-scaling factors. Experimental results with six CNN-based SR models demonstrated that the MSSR method could effectively improve the performance of existing CNN-based SR models for continuous-scaling-factor SR without retraining networks. Furthermore, the comparison with a magnification-arbitrary method, called Meta-SR, shows that the proposed MSSR method usually outperforms Meta-SR for scaling factors greater than or equal to 2.


I. INTRODUCTION
Single image super-resolution (SR) reconstructs a highresolution (HR) image from a low-resolution (LR) image. As a classic problem in multimedia and image processing, the SR problem is inherently ill-posed, and no unique solution exists for SR. Convolutional neural network (CNN)-based SR methods have achieved great success in discrete-scaling-factor SR. For example, when the CNN for SR (SRCNN) [1] method is applied to an integer-scalingfactor SR task, the peak signal-to-noise ratio (PSNR) of the reconstruction result is usually improved by at least 3 dB compared to when using the bicubic method. However, the scaling factor for SR should be continuous and not discrete in practical applications. Actually, the degradations in The associate editor coordinating the review of this manuscript and approving it for publication was Qiangqiang Yuan . real-world LR images are far more complicated. Therefore, blind SR [2]- [5] methods have been presented to solve the actual complex degradations. In this paper, the degradations of the LR images only include downsampling by arbitrary scaling factors.
Recently, an arbitrary-scaling-factor SR method, called Meta-SR, has been proposed to reconstruct high-resolution image for arbitrary scale, including integer-and non-integerscaling factors [7]. It uses two fully connected layers to generate a dynamical kernel for the upscaling of each scaling factor. However, Meta-SR needs more parameters to predict kernel weights for each scaling factor. Furthermore, Meta-SR needs to be retrained when combined with other advanced SR models.
To our best knowledge, no SR method other than Meta-SR has studied arbitrary-scale SR. In this paper, we first experimented with existing CNN-based SR models for both  integer-and non-integer-scaling factors. It is worth noting that the CNN-based SR models we experimented with are trained for only some integer-scaling factors, and the images for non-integer-scaling factors are not included in the training. Because there are unlimited scaling factors, we think it is impossible and has no practical value in real applications to retrain SR models for all scaling factors. Therefore, we use the SR models trained with integer-scaling-factor data to upscale the images. We found that the experimented CNN-based SR models do not generalize well and performs poorly for non-integer-scaling factors and sometimes are even worse than the bicubic interpolation method.
As shown in Fig. 2, the reconstruction result of the very deep CNN for SR (VDSR) [8] method ( Fig. 2(b)) shows visually noticeable distortions. Particularly, the white blocks in the magnified plaid tablecloth region become smaller, while the black blocks become larger. Furthermore, its PSNR and structural similarity (SSIM) values are lower than those of the bicubic interpolation results (Fig. 2(c)). The performance decrease is caused by the difference between the training and testing data. The SR models are trained using some integer-scaling factors among which the smallest one is 2. However images with non-integer-scaling factors, e.g., 1.4 in Fig. 2, are not included in the training of the SR models, it is not necessary that the SR models can produce a better result than bicubic. As shown in Fig. 2(d), our proposed method can improve the performance of CNN-based SR models, e.g., VDSR, for non-integer-scaling factors without retraining any SR networks. In particular, for some networks with upsampling modules, such as RCAN [6], they cannot handle non-integer-scaling-factor SR. However, this problem can be solved without training the networks when combined with the proposed method. As shown in Fig. 1, the proposed method combined with RCAN not only enables RCAN to do arbitrary-scale SR like bicubic, but also retains the effective reconstruction capability of RCAN to obtain images have clear structures and rich details.
In this paper, we propose an SR method for continuousscaling-factor SR. To extend CNN-based SR models to continuous scale, this paper proposes a multiple-scaling-based SR (MSSR) method without retraining networks. To make full use of the advantages of the CNN-based SR models in integer-scaling-factor SR, the proposed MSSR method combines an integer-scaling-factor SR and once or twice the non-integer-scaling-factor SR. For the example shown in Fig. 3, we first reconstruct an LR image with a resolution of H × W into HR images with resolutions of 2H × 2W , 3H × 3W , and 4H × 4W using bicubic interpolation method. We then use each of the HR images as the input for the VDSR model to compute the SR result. Finally, we scale each SR result to the target resolution of 1.4H × 1.4W using the bicubic interpolation method. As shown in Fig. 3(d)-(f), Result1, Result2, and Result3 use the high resolutions of 2H × 2W , 3H × 3W , and 4H × 4W , respectively. The result from directly applying the VDSR model ( Fig. 3(b)) reveals some distortions, particularly in the magnified eye region. The results of the proposed MSSR method (Fig. 3(d)-(f)) are visually improved and have larger PSNR and SSIM values. As shown in Fig. 3(d)-(f), different high resolutions to VDSR lead to different final results. Therefore, we also compute the optimal integer magnification factor for any non-integerscaling factors.
The proposed MSSR method can be applied to a variety of CNN-based SR models. Experimental results with six CNN-based SR models demonstrated that the MSSR method could effectively improve the performance of existing CNN-based SR models for continuous-scalingfactor SR without retraining networks. Last but not least, compared with Meta-SR, the proposed MSSR method usually outperforms Meta-SR for scaling factors greater than or equal to 2. The main contributions of this paper are as follows. 1) We observed that existing CNN-based SR methods could not typically perform well with non-integerscaling-factor SR, which limits their usage in practical applications. 2) We propose a multiple-scaling-based SR method that takes advantage of both CNN-based and conventional SR methods. The proposed MSSR method is simple, efficient, and effective.

II. RELATED WORK
Single image super-resolution methods can be divided into interpolation-based, reconstruction-based, example-based, and CNN-based methods. However, the interpolation-and reconstruction-based methods [9] usually cannot perform well for large-scaling-factor SR. Example-based methods include self-and external-example-based methods. The self-example-based methods (cf. [10]) utilize self-similarity attributes and extract example pairs from an LR image across different scales. The external-example-based methods learn a mapping between LR and HR patches from external datasets. The mapping is learned using nearest neighbor [11], manifold embedding [12], random forest [13], or sparse representation [14], [15]. The example-based methods usually cannot work well for large-scaling factors.
CNN-based SR methods have improved the performance for some discrete-scaling factors, such as 2, 3, and 4. A pioneer work on SR for arbitrary scale is Meta-SR [7]. Meta-SR proposes the meta-upscale module to predict the weights of the convolution kernels for varying scaling factors. The input of the meta-upscale module is a scale-related and coordinate-related vector. However, the meta-upscale module brings more parameters for training and storage.
Different from Meta-SR, the other CNN-based methods focus on some discrete-scaling factors. For example, Dong et al. [1] first proposed a three-layer CNN for SR (SRCNN) to optimize the feature extraction, non-linear mapping, and image reconstruction stages jointly in an end-to-end manner. It takes a bicubic-interpolated enlarged image as the input image and learns the mapping relationship between the bicubic-interpolated image and the original HR image. Some following works [8], [16]- [18] also use the bicubic-interpolated enlarged image as the input image. VDSR [8] investigates residuals between the input LR and output HR images. By using an adjustable gradient truncation strategy, VDSR learns a robust SR model. On this basis, Kim et al. proposed a deeply recursive convolutional network (DRCN) [16] for SR. Tai et al. presented a deep recursive residual network (DRRN) [17] for SR. Besides, a deep end-to-end persistent memory network (MemNet) for SR [18] tackles the long-term dependency problem in the previous CNN-based SR methods. These networks take the bicubic-interpolated enlarged image as the input image without a upsampling module, the output image is of the same size as the input image.
In order to increase the SR speed and reduce computation, researchers proposed to embed an upsampling module at the end of the network, so that LR images can be directly sent to the network as the input without pre-upsampling. Many works [19]- [22] use deconvolution layers as the upsampling module. Dong et al. proposed a faster version of SRCNN (FSRCNN) [19]. FSRCNN uses the deconvolution layer as a reconstruction block, takes the original LR image as the input, and is more efficient. Similarly, a deep encoding and decoding framework (RED) for SR [20] also adds deconvolution layers to upsample image and combines convolutions and deconvolutions to extract primary image content and recover details. A Laplacian-pyramid-inspired CNN network for SR (LapSRN) [21] predicts high-frequency residuals in a coarse-to-fine manner and uses deconvolution to progressively generate upsampling images with 2× magnification. Different from above mentioned feedforward network structures, a deep back-projection network (DBPN) [22] proposed by Haris et al. exploits iterative up-and down-sampling layers, providing an error feedback mechanism for projection errors at each stage. Other works [23], [24]   that can map LR data into HR space with a little additional computational cost. Furthermore, a residual dense network (RDN) [24] applies both deconvolution layer and subpixel convolution layer. Its innovation lies in using densely connected convolutional layers and residual dense blocks to make full use of the hierarchical features from the original LR images. To mitigate the problem of low-frequency information hindering the representational ability of CNNs, a very deep residual channel attention network (RCAN) [6] proposes applying a channel attention mechanism to rescale channel-wise features adaptively by considering interdependencies among channels.
In summary, CNN-based SR methods except Meta-SR focus only on discrete-integer-scaling-factor SR. Furthermore, although the meta-upscale module of Meta-SR could dynamically predict the weights for the upscale module, it brings more parameters for training and storage. And Meta-SR usually shows only comparable performance when compared with the proposed MSSR method for large scaling factors.

III. PROPOSED METHOD
In this section, we first present our observations on CNN-based SR models for non-integer-scaling factors and then provide details about the proposed MSSR method.

A. OUR OBSERVATION
We first observed that existing CNN-based SR models usually do not perform well for non-integer-scaling-factor SR. As shown in Fig. 3, the result computed by directly applying the VDSR model ( Fig. 2(b)) reveals some distortions in the magnified eye region of the bird. The PSNR and SSIM values of the VDSR result are also smaller than those of other results, even worse than those of the bicubic interpolation result (Fig. 2(c)). The existing VDSR model is trained for integer-scaling factors, which results in the inferior performance on the non-integer-scaling factors.

B. FRAMEWORK OF MSSR METHOD
To improve the performance of CNN-based SR models for non-integer-scaling factors, this paper proposes a multiplescaling-based SR. As shown in Fig. 4, the proposed MSSR method consists of three steps: optimal integer-scaling factor selection, CNN-based super-resolution, and sampling to target resolution. Here, I LR ∈ R 3×H ×W and I HR ∈ R 3×tH ×tW represent the input and output, respectively, where t is the scaling factor.
In the optimal integer-scaling factor selection step, we need to select an optimal integer-scaling factor for any continuous scale, as CNN-based methods perform well only on some integer-scaling factors. The optimal integer-scaling factor is computed as described in Section III-C. For a CNN-based SR method (such as VDSR [8], DRRN [17] and MemNet [18]), whose input and output images have the same size of tH ×tW , the proposed MSSR method upsamples the input image I LR ∈ R 3×H ×W to the size of tH × tW . Because the bicubic interpolation preserves more details than the other interpolation methods, we use bicubic interpolation to conduct upsampling. For a CNN-based SR method (such as DBPN [22], RDN [24], and RCAN [6]), whose input image has a size of H × W and output image has a size of tH × tW , the proposed MSSR method does not perform the upsample operation.
In the second step, an existing CNN-based SR model is used for offline SR without retraining. Therefore, a well trained CNN-based SR model can be used directly. Finally, the final SR result I HR is obtained by sampling the output of the CNN-based SR model to the target resolution using the bicubic interpolation method.

C. OPTIMAL INTEGER SCALING-FACTOR SELECTION BASED ON SIMILARITY
Because those existing CNN-based models perform well only on some integer-scaling factors, we first calculate the optimal integer-scaling factor s (s ∈ S = {2, 3, 4} for each non-integer-scaling factor. When the test data changes from integer-scaling factors into non-integer-scaling factors, current models need to be retrained from scratch to achieve good performance on non-integer-scaling-factor SR. However it is expensive to retrain the models for arbitrary scales and store the models. As work [30] pointed out, if the data in the source and target domains are similar, the model trained in the source domain could transfer into the target domain. Therefore, we compute the optimal integer-scaling factor based on the similarity of data. We use two quality assessment metrics for image super-resolution, Energy Similarity (ES) [31] and Frequency Similarity (FS) [32], to measure the similarity between integer-and non-integer-scaling low-resolution images. ES can measure global visual information degradations between two patches. Therefore, we use ES to compute the similarity between the LR patches of non-integer-scaling factors and those of integer-scaling factors (including 2, 3, and 4). The ES from the energy falloff curves can be computed by where n is the number of scales when we decompose image patches into steerable pyramids, E(j) stands for the energy difference between adjacent scales, E in and E non represent the energy difference vector of image patch for integer-scaling factor and non-integer-scaling factor, respectively. A smaller value of ES indicates better performance. According to [32], FS is computed in the Fourier domain as follows, where H non , H in , V non , and V in represent the horizontal and vertical spectrum components of image patch for integer-scaling factor and non-integer-scaling factor, respectively. The range of FS values is [0, 1], and the larger the FS value, the more similar the two patches are. Table 1 shows the similarity between LR images of noninteger-scaling factors and those of integer-scaling factors (including 2, 3, and 4) on dataset BSD100 [33]. The best performance value for each scaling factor is shown in red, and the second-best performance value is shown in blue. The scaling factor range is (1,4], and the interval is 0.1. As shown in Table 1, when the scaling factor t is in the range of (1, 2.3], the LR images are most similar to that of s = 2, when t is in the range of (2.3, 3.4], the LR images are most similar to those of s = 3, when t is between 3.5 and 4, the LR images  are most similar to those of s = 4, and very similar to those of s = 3. The computed optimal integer-scaling factor used in all experiments in the paper is shown in Table 2.

D. CNN-BASED SUPER-RESOLUTION
The proposed MSSR method performs a bicubic upsampling on the input image I LR ∈ R 3×H ×W to obtain I input ∈ R 3×sH ×sW as input of existing CNN-based SR models, such as VDSR [8], DRRN [17], and MemNet [18]. For other CNN-based SR models, such as DBPN [22], RDN [24], and RCAN [6], the proposed MSSR method does not perform VOLUME 8, 2020 upsampling and directly use the input image I LR as the input I input . Our method directly uses an existing model without spending considerable time and consuming many computing resources for retraining the model. The rebuild operation can be expressed as: where F ×s (; θ) is a mapping function from an LR image to an HR image using an existing model with scaling factor s. The model is composed of convolution layers and rectified linear unit activation functions. The size of HR image I HR mid obtained at this stage is sH × sW .

E. SAMPLING TO TARGET RESOLUTION
This step establishes a linear mapping between the intermediate HR image I HR mid and the final full-resolution image I HR , and is implemented using the bicubic interpolation method. In this step, the HR image I HR mid from the sH × sW space is mapped to the target image I HR of resolution tH × tW . The bicubic interpolation method guarantees the quality of reconstruction and simultaneously reduces the computational complexity.

IV. EXPERIMENTS A. DATASETS
Existing CNN-based SR methods typically use 291 images as the training set, 91 images of which are from the dataset proposed by Yang et al. [15] and 200 images from the Berkeley segmentation dataset (BSD) [33]. Recently, a high-quality (2K resolution) dataset DIV2K have been released for image restoration applications [34]. DIV2K consists of 800 training images, 100 validation images, and 100 test images. And there are serveral CNN-based methods trained with 800 training images and 5 validation images in the training process (such as DBPN [22], RDN [24], and RCAN [6]). As in [6], [8], [16], [19], [21], [24], to make full use of the training data, those CNN-based SR methods applied data augmentation by rotating images with degrees of 90, 180, and 270, and flipping images horizontally. The SR methods were evaluated on three widely used benchmark datasets: Set5 [35], Set14 [36], and BSD100 [33]. These datasets consist of natural scene images.
The ground truth images were downscaled by bicubic interpolation to generate LR/HR image pairs for testing. As in [1], [8], [16]- [21], [23], we converted each RGB color image into the YCbCr color space and processed only the Y channel, and the color components were enlarged using bicubic interpolation. For a CNN-based SR method operated directly on the RGB color space (such as DBPN [22], RDN [24], and RCAN [6]), we do not need to perform the above transform operation.

B. EXPERIMENTAL RESULTS
We combined the proposed MSSR method for continuousscaling-factor SR with six CNN-based SR models, including VDSR [8], DRRN [17], MemNet [18], DBPN [22], RDN [24], and RCAN [6]. In this paper, PSNR and SSIM were used as evaluation indicators. For DBPN, RDN, and RCAN, each method provides several models for the integerscaling-factor SR. Due to the page limit, we only show the results of three RCAN pre-trained models, including RCAN x2, x3, and x4. The results of DBPN and RDN pre-trained models are similar to those of RCAN pre-trained models. Tables 3-5 show the average PSNR and average SSIM values on Set5 [35], Set14 [36], and BSD100 [33] dataset, respectively. We also calculated the dataset size-weighted average PSNR and SSIM values across 3 datasets and report the results in Table 6. For the experimental results of each scaling factor, the top line represents the PSNR values, and We can see that the PSNR and SSIM values of the SR results obtained by our MSSR method are higher than those obtained by directly applying the existing CNN-based      SR models on various scales. In addition, when t is in the range of (1, 2), the performance improvements achieved by our MSSR method were significant.
In Fig. 5 and Fig. 6, we show some examples of the proposed MSSR method using existing pre-trained VDSR [8], DRRN [17], MemNet [18], DBPN [22], RDN [24], and  RCAN [6] models. In Fig. 5 and Fig. 6, (a) is the original full-resolution image, (b)-(g) are SR results of VDSR, DRRN, MemNet, RCANx2, RCANx3, and RCANx4, respectively, (h) is obtained directly using bicubic interpolation, and (i)-(n) are the results of the proposed MSSR method when combined with VDSR, DRRN, MemNet, DBPN, RDN, and RCAN models, respectively. As shown in the magnified regions in Fig. 5, the black and white stripes of the shirt in the proposed MSSR's results are clearer. It can also be seen from the magnified regions in Fig. 6 that the feather pattern of the bird is not too sharp, which is closer to original image in Fig. 6(a). It can be seen that the proposed MSSR method works better than the original CNN-based SR pre-trained models for majority scaling factors and significantly improves the performance of SR on non-integerscaling factors.

C. COMPARISON WITH META-SR
We compared the proposed MSSR method with Meta-SR [7] on Set5, Set14, and BSD100 datasets. The experimental results are shown in Tables 7-9, respectively. Table 10 shows the dataset size-weighted average PSNR and SSIM values across 3 datasets. The highest PSNR and SSIM values on each scaling factor in Tables 7-10 are formatted in red and bolded, and the second-highest PSNR and SSIM values are formatted in blue and bolded.
As shown in Tables 7-10, when the scaling factor is between 1 and 2, Meta-SR achieves better performance. When the scaling factor is greater than or equal to 2, the proposed MSSR usually achieves better performance.
In Fig. 7, we show some comparison results of Meta-SR and the proposed MSSR. As shown in the magnified regions, the results of the proposed MSSR are clearer than those of the Meta-SR. Specifically, Fig. 7(b) seems blurred as compared to Fig. 7(c). In Fig. 7(e), the rod on the top of the building becomes hard to recognize, while that in Fig. 7(f) is clear.
It is worth mentioning that the proposed MSSR method does not require retraining any models with arbitrary-scalingfactor data from scratch and can be effectively combined with a variety of existing SR models. Furthermore, although Meta-SR trains a model for a large number of scales, it still needs to address the problem from a large number of scales to unlimited arbitrary scales.

D. DISCUSSION
We use the typical VSDR method [8] as the CNN-based SR method in the second step to verify the proposed optimal integer-scaling factor selection based on similarity that is described in Section III-C. Table 11 shows the SR results of the proposed MSSR method combined with VDSR on dataset BSD100 [33]. The scaling factor range is (1,4], and the interval is 0.1. The last three columns correspond to the results when using s = 2, 3, and 4 as the optimal integer-scaling factor for all non-integer-scaling factors, respectively.
As shown in Table 11, the optimal integer-scaling factors determined by experiments are consistent with the computed ones for most of the experimented scaling factors. The computed optimal integer-scaling factor used in all experiments in the paper is shown in Table 2. Regardless of the value of s, the results of the proposed method are better than or equal to the bicubic and VDSR methods, and the gain is close when s = 2 or 3. Experimental results show that the optimal s we computed applies well to other CNN-based super-resolution models To further validate the performance of the proposed MSSR method, we compared it with a re-trained CNN-based  Table 12 shows the experimental results of the re-trained VDSR model and the proposed MSSR method for continuous-scaling factors in the range from 1.10 to 1.90. As shown in Table 12, the proposed MSSR method shows more stable and better performance than the re-trained VDSR model. When the scaling factors are 1.10 and 1.20, the performance of the re-trained VDSR model is worse than the bicubic interpolation. The experimental results show that it is hard to re-train the CNN-based models straightforwardly. In the future, we will investigate the domain knowledge to improve the performance of the CNN-based SR models for continuous-scaling factors.
As shown in Table 3, for each SR method, the performance improvement of a small scaling factor is usually large, while that for a large scaling factor is usually small. As shown in Table 10, the experimental results of the Meta-SR method show the same trend of our method. Specifically, a small scaling factor usually achieves larger performance improvement than a large scaling factor by Meta-SR and the proposed methods. It is worth mentioning that performance improvements achieved by Meta-SR and the proposed methods for large scaling factors are usually small. We believe that future works on SR for arbitrary scaling factors should focus on addressing large scaling factors.

V. CONCLUSION
After observed that the existing CNN-based SR models are not effective with non-integer-scaling-factor image SR tasks, particularly when the scaling factor is between 1 and 2, we extended CNN-based SR models from discrete to continuous scale and proposed a multiple-scaling-based SR method without retraining networks. The proposed MSSR method not only retains the superiority of CNN-based models on discrete-scaling-factor SR but also achieves good performance on continuous-scaling-factor SR. Experimental results demonstrated that the proposed MSSR method could effectively extend a variety of CNN-based SR pre-trained models for continuous-scaling-factor SR. In the future, we will consider performing blind SR on the real raw image with arbitrary scaling factors.