An Effective and Comprehensive Image Super Resolution Algorithm Combined With a Novel Convolutional Neural Network and Wavelet Transform

In order to further improve the reconstruction effect of the image super resolution algorithm, this paper proposes an image super resolution algorithm combining deep learning and wavelet transform (ISRDW). In terms of network design, it is not only simple in structure, but also more effective in capturing image details compared with other neural network structures. At the same time, cross-connection and residual learning methods are used to reduce the difficulty of the training model. In terms of loss function, this paper uses the loss generated in the original image space domain and the wavelet domain to strengthen the constraint of network training. Experimental results show that the algorithm proposed in this paper achieves better results under different data sets and different evaluation indexes.


I. INTRODUCTION
In the field of image processing, in order to obtain higher resolution images, the super-resolution method is usually used to reconstruct the details of low-resolution images. In general, Super-Resolution (SR) is a digital image processing technology that reconstructs a high-resolution image from one or more low-resolution images. Single Image Super-Resolution focuses on how to reconstruct the local details of a highresolution image from a low-resolution image. This problem has developed into an important research direction in the field of image processing [1], [35]. Because it can recover some high-frequency details, this technology is widely used in the field of image processing that requires a lot of detailed information, such as medical imaging [2], satellite imaging [3], face authentication [4]- [8] and public relations security monitoring, etc.
The instance-based SR method has been shown to achieve better results by using large image data sets to learn mapping from low resolution (LR) images to high resolution (HR) images. Many machine learning algorithms, including The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Cheng . dictionary learning [9], local linear regression [10] and random forest [11], have been applied in this field. In recent years, the method based on convolutional neural network (CNN) is widely used in computer vision tasks due to its powerful learning ability, and has made impressive progress in the fields of target recognition, segmentation, optical flow and super resolution. Although the super-resolution method based on convolutional neural network has made a great breakthrough compared with the traditional method, there are still many problems. Most of the existing SR method [11]- [18] 's training process relies on per-pixel mean square error in the image space to make the network output as close to HR image as possible, but this approach tends to produce fuzzy and overly smooth output, lacking detailed information. Moreover, the current work is only suitable for small and specific scaling factors (such as x2 or x4). Therefore, SISR remains to be further explored and developed.
Wavelet transform has been proved to be an efficient feature extraction algorithm, which is often used to reproduce and store multi-resolution images [19], [34]. As shown in Figure 1, it can represent context and texture information for an image at different levels. The spatial wavelet coefficient itself is sparse, so it can make network learning easier.
Chen et al [36] proposed a method combining wavelet transform and convolutional neural network (CWT-CNN). However, its disadvantage is that it uses four convolutional neural networks to work in a serial way, which undoubtedly increases the time complexity and the effect is not very good (see TABLE 3). In addition, its results on the ×4 and ×8 scales are not given. DWSR [20] converts reconstructed HR images into inferring a series of relevant wavelet coefficients of HR images by means of wavelet transform. This method achieves simpler computation and faster speed with the same precision as the current optimal very deep super-resolution network (VDSR) algorithm. The proposed method is improved on the basis of this method. In terms of input, in order to reduce the amount of computation and realize the real-time processing from LR to HR, we directly take LR images as the input of the network. In terms of network architecture, the framework we designed is divided into three parts: feature extraction network, inference network and reconstruction network. In order to better capture the context information and infer the missing details, the feature extraction network adopts the method of connecting the residual blocks [21], and at the same time connects the output of each residual block as the input of the inference network. In addition, the number of wavelet coefficients (i.e., the number of channels in the final output of the network) is no longer a fixed value, but will change with the scale. In terms of the loss function, this paper not only considers the residual loss between the wavelet coefficients, but also considers the residual loss between the SR image and HR image after the final inverse wavelet transform, so as to carry out double constraints and achieve better results.
For this paper, the main contributions are as follows: (1) We propose a super resolution algorithm for ''deep image'' in wavelet domain to improve the detail of image reconstruction. (2) Different from the previous algorithms, the method in this paper uses a multi-step strategy, that is, the wavelet coefficients corresponding to the high-resolution images are firstly inferred, and then the image is reconstructed. (3) In order to improve the accuracy of image reconstruction, the loss function of spatial domain and wavelet domain is combined to learn the parameters of deep neural network. (4) Experiments on different datasets demonstrate the effectiveness of the proposed method. This section gives an overview and summary of the methods to solve the super-resolution problem of a single image, and briefly summarizes the basic ideas of this method. Section II summarizes the existing super-resolution methods based on convolutional neural networks. Section III describes the framework of the method and the design of the loss function. Section IV shows the results of this method compared with other related methods on different data sets, and summarizes the advantages and disadvantages of this method. Section V summarizes this paper and makes a preliminary discussion on the future research direction worthy of attention.

II. RELATED WORKS
It is a typical ill-posed inverse problem to infer and synthesize high resolution images from observed low resolution images. Existing algorithms can be divided into two categories according to technical means: reconstruction-based methods and learning-based methods [22]. Image reconstruction-based SR method usually requires sub-pixel alignment of LR image sequences to obtain motion offset between HR images, thus constructing spatial motion parameters in the observation model, and applying different constraints to solve HR images. Image learning-based SR method uses prior knowledge to obtain the mapping relationship between LR and HR images through given sample training to reconstruct HR images. This type of method includes three steps: training set construction, feature learning and high-frequency detail reconstruction. Training Set construction obtains the corresponding LR image by downsampling the HR image and other degradation operations; feature learning learns the mapping relationship between LR and HR image; High-frequency detail reconstruction can recover HR image from LR image according to the learned mapping relationship between LR and HR image. According to the source of the training sample base and the different matching feature domain, the methods can be divided into image self-similarity based, neighborhood embedding based and sparse representation based. In the method based on image self-similarity, the training sample set comes from the input image itself; in the method based on neighborhood embedding and sparse representation, the training sample set comes from the external database and has nothing to do with the input image. However, the ability of feature extraction and expression of traditional learning models is limited, which limits the improvement of super resolution reconstruction effect to a large extent. In recent years, deep learning-based methods have been introduced to solve SR problems due to their strong ability to learn knowledge from large-scale data.
With the rapid development of deep convolutional neural network in high visual level tasks, a large number of CNN-based methods have also been applied to computer tasks with low visual level, such as image super-resolution and image denoising, etc. SRCNN [12] is the first neural network model that introduces convolutional neural network into the field of image super-resolution reconstruction. The network structure consists of three convolution layers, which represent feature extraction, representation and reconstruction of image blocks respectively. By introducing convolutional neural network, this method significantly improves the reconstruction progress of traditional methods. Although SRCNN works well, it still has problems such as lack of contextual information, single scale amplification and slow convergence speed. In order to solve this problem, some researchers proposed VDSR [13] (ultra-deep super-resolution VOLUME 9, 2021 network) with deep network. This method is based on the VGG [23] network structure designed for image classification, which trains the residuals between HR and LR images and uses a higher learning rate to accelerate convergence. At the same time, the multi-scale image amplification with less network parameters and better reconstruction performance is realized by weight sharing. In addition, DRCN [14] adds recursive connection on the basis of VDSR to realize information feedback and context information correlation between image layers, so as to further improve the effect. At the same time, the model is compressed into 5 layers to reduce the difficulty of training.
The above methods all use convolutional neural networks to learn the mapping of middle resolution (MR) images to HR images, where the MR images are obtained by upsampling the LR images using bicubic interpolation. In order to realize the direct mapping from LR to HR, FSRCNN [13] uses a deconvolution layer to replace the bicubic interpolation operation in the SRCNN model. After removing the upsampling operation, the model can learn direct mapping from LR images to HR images, and to achieve more than 40 times the speed of ascension. The FSRCNN model only includes a convolutional layer and a deconvolutional layer. The convolutional layer shares the weight of the convolutional layer for different magnifications. Therefore, FSRCNN can use a single model to handle different scales. ESPCN [16] is based on FSRCNN to design a sub-pixel convolutional layer to achieve the up-sample operation. This method could reduce the total computational complexity effectively and synthesize a clean image without checkerboard artifacts. LAPSR [18] is one of the recently proposed SR methods for single image. This model includes a cascading framework based on Laplacian pyramid for feature extraction and image reconstruction, and uses Charbonnier loss function instead of L2 norm loss function to achieve a better SR reconstruction effect.
The above-mentioned super-resolution method based on convolutional neural network processes the spatial domain of the image, and the purpose is to directly reconstruct the pixel value as the network output, which usually makes the result tend to be blurred and over-smooth. In order to solve this problem, PLSR [11] substituted perceptual loss for per-pixel difference loss to optimize SR network to ensure more semantic information. Although this strategy fails to achieve good results in Peak Signal to Noise Ratio (PSNR) indicator, it is more realistic in vision and can bring more good details and edges. SRGAN [24] combines perceptual loss with Generative Adversarial Networks (GAN) model on the basis of PLSR to generate more real and sharp images. This method uses the generator network in GAN to generate HR images, and then the discriminator network is used to judge the HR images which meet the conditions. However, the perceptual loss based on VGG classification network proposed by this method cannot accurately capture the necessary details in SR tasks. Therefore, SRPGAN [25] proposed a perceptive loss based on discrimination network in GAN model and used Charbonnier loss function. The super resolution image reconstructed by this method is sharper and more realistic.
To sum up, there are mainly three different approaches to solve this problem. 1. Different network architectures are designed to reconstruct high-resolution images. 2. The perceptive loss and GAN generation models are used to generate more real and sharp images. 3. consider dealing with the problem in the case of a transformation domain. The feature of separating the ''process'' and ''detail'' of the image content provided by wavelet transform is similar to the SR problem of restoring the image details from input LR image. Therefore, our method considers transforming to wavelet domain for SR reconstruction. At present, there are some methods based on wavelet transform to solve the problem of SR [13], [26]- [28]. However, most of them focus on the SR problem of multiple images. These methods use multiple frames of LR images to infer the missing details of HR images. For the SR problem about a single image, although some wavelet domain interpolation methods have been studied, their limited training and simple prediction program are not enough to deal with ordinary input images, and their SR results are far worse than those of the super resolution method based on deep learning.
DWSR [20] is the first method to combine the complementary information between low frequency and high frequency information sub-bands by deep convolutional neural network in the wavelet domain. In this method, the problem of reconstruction of HR images is transformed into a series of inferred wavelet transform coefficients, so that the resulting image edges have fewer artifacts. The network used in this method is directly connected by a series of convolutional layers of the same size, the lack of information fusion between layer and layer, and there is no scale and the corresponding relationship between wavelet decomposition series, all the result of the dimension after the final network training is as much as possible close to the HR image with MR image level residuals between the wavelet decomposition coefficient. In order to better capture the information between the context and infer more missing details, our method is further improved on the basis of DWSR, i.e., directly using LR image as the input of the network and designing a more reasonable network structure. In addition, the number of wavelet coefficients output by the proposed network structure is no longer a fixed value, but changes with the change of scale.

III. METHOD
A new deep image super resolution algorithm based on wavelet domain is proposed, which combines the idea of wavelet transform with the deep residual network, and combines the image space loss and wavelet coefficient loss to strengthen the constraint of network training. This model can be applied to the super-resolution reconstruction of images, and can separate the features of each level of images for training and capture more missing details.
A. WAVELET TRANSFORM Figure 1 shows the results of two-dimensional wavelet transform. It can be intuitively seen that if the first wavelet image after transformation is represented as LR image, then other wavelet images are the missing details that are intended to be obtained. Therefore, wavelet transform is introduced into SISR problem in this paper. The emphasis of the proposed method is to emphasize the effectiveness of the wavelet transform for the super-resolution reconstruction problem. Therefore, the selection of the simplest HAAR wavelet can meet the requirements of realizing the description of different frequency band information. Of course, other wavelets can also be used. HR images are transformed into a series of wavelet images of the same size by using HAAR wavelet transform as the training target of convolutional neural network. As shown in Figure 2, the higher-level transformation is the operation of low-pass filtering and high-pass filtering and down-sampling in a circular manner.

B. NETWORK ARCHITECTURE
The proposed method is based on spatial domain to wavelet domain to reconstruct the super-resolution image. In order to extract features more effectively and reduce the complexity of operations, LR images are directly taken as the input of the network and finally mapped to the wavelet domain of HR images. The whole process from LR to HR reconstruction is actually multiple networks, but we treat these multiple networks as one network for unified and effective learning. The learning of the whole network belongs to multiple stages, in which a series of wavelet images output by inference network belong to the first stage, and in the second stage, inverse wavelet transformation is carried out according to the results of the first stage to obtain the final super-resolution image (SR). Finally, the loss of wavelet coefficient and the loss of image space pixel are transmitted back to the whole network in the form of gradient for updating. As shown in Figure 3, the neural network structure proposed by us includes three parts: feature extraction network, inference network and reconstruction network. The feature extraction network extracted the feature from the low-resolution image, and the inference network expressed the extracted feature as the difference between a series of HR images and the corresponding MR image wavelet coefficients, and the reconstruction network reconstructed the SR image using inverse wavelet transform. In order to capture more missing details, this paper constructs a robust loss function to calculate the similarity between SR and HR based on the wavelet coefficient and the loss of pixels in the image space, so as to optimize the proposed neural network. The loss function is described in detail in the following part.

1) FEATURE EXTRACTION NETWORK
Feature extraction network takes LR images as input and represents them as a series of feature maps through the forward VOLUME 9, 2021 FIGURE 3. The proposed ISRDW diagram. The yellow region is the feature extraction network, the green region is the inference network, and the orange region is the reconstruction network. In addition, the red part performs loss calculations during network training.
propagation of neural network. The feature extraction network is composed of several residual blocks cascading. The residual block is composed of two convolution blocks with the same kernel size and the number of filters, and its output is composed of the input and the sum of the results after two successive convolution blocks. Each residual block acts as a unit, and the output of each unit is passed to the next unit and simultaneously crossed as input to the inference network. All convolutional layers share the same kernel size, i.e., 3 × 3. In order to keep the size of the feature map consistent with the input, step size and edge fill are set to 1. At the same time, in order to obtain more and richer information, the number of convolutional layer filters increases with the deepening of the network.

2) INFERENCE NETWORK
Inference network takes the output of feature extraction network as input, but because its dimension is relatively large, it uses 1 × 1 convolution to reduce the feature dimension. Meanwhile, in order to ensure that the inferred wavelet image has the same size as the LR image, all parameters of the convolution layer in the inference network are set consistent with that of the feature extraction network, that is, the size of the convolution kernel is set as 3 × 3, and the step size and edge filling are set as 1. Because the wavelet decomposition coefficient is highly independent, the relationship between channels is not considered. The number of final output channels of different scale networks is the square of the scaling factor, and each channel represents a one-to-one corresponding wavelet image. As shown in Figure 3, the final output of the network is added to the wavelet decomposition coefficient of the MR image, and then converted to the original image space through inverse wavelet transformation. Just like a typical residual network, proposed model also aims at learning a residual output. Because the size of wavelet transform images of different scales is different, and the number of channels output by corresponding networks is also different, multiple networks need to be learned. However, only the parameter settings of the last convolution layer are different in the training of images of different scales, and other convolution layers share weights for different scales. Therefore, only the network of one scale can be trained from scratch, and other scales can be fine-tuned on the trained model.

3) RECONSTRUCTION NETWORK
Reconstruction network is to add the output of the inference network and the result of LR after up-sampling and wavelet transform as input. By using inverse wavelet transform, a series of wavelet images are generated into a corresponding SR image, and the final result is obtained. Based on the intermediate results (a series of wavelet coefficients) and the final SR image, we propose a flexible and more constrained loss function to optimize the network, which consists of two parts: the wavelet coefficient loss and the image space pixel loss. The total loss function is defined as: Among them, λ and 1 − λ represent the weight of wavelet coefficient loss and image space loss respectively, which will be explained in detail in Section IV.
(1) Wavelet coefficient loss. In this paper, the input LR image is set as x, the corresponding label HR image is set as y, and The MR image sampled from the input LR by bicubic interpolation is y b . θ is a series of network parameters to be optimized. The general single image super resolution network is designed to learn the mapping relationship between a given low-resolution input x and a high-resolution image y. The network structure we proposed learns the relationship between the coefficients of low-resolution input x and high-resolution image y after wavelet transform, and the network output is as close as possible to the coefficients of the corresponding high-resolution image after wavelet decomposition. In this paper, the scaling factor is defined as {r, r ≥ 2}, the series of wavelet transform is defined as m and the number of wavelet coefficients after transformation is defined as W N , and the mapping relationship between them is defined as m = log r 2 , W N = r 2 .
Meanwhile, the coefficients of image y and y b after wavelet decomposition are respectively expressed as: The difference of their coefficients after decomposition (i.e., residual) can be calculated as: The result of Equation (2) is the learning goal of the neural network, namely: f θ (x) ∼. The most commonly used loss function in the image space is the per-pixel mean square error operation between HR image and SR image. This method is also adopted in this paper, but the difference is that this operation is carried out on the wavelet coefficients corresponding to the image. That is: where n is the size of batch, i represents the i-th image in the batch image, and j is the j-th coefficient in the wavelet coefficient sequence.
(2) Image space loss. The proposed neural network structure learns the difference between the coefficients of the MR image and the HR image after wavelet transformation. The output result is subjected to wavelet inverse transformation to obtain the learned residual image I SR , namely: After the residual image is obtained, the final super resolution image I SR can be obtained by adding it to the MR image y b sampled from the original input image after bicubic interpolation: Considering the previous loss calculation of image space, in order to achieve a balance between texture and smoothness, the loss of original image space is added on the basis of the loss of wavelet coefficient, so as to calculate the residual loss of image space:

IV. EXPERIMENT A. IMAGE SETS
In this paper, 800 images from the D2VIK dataset are used as the training set, and all the images have at least one axis with 2K pixels (vertical or horizontal). the step size of 64 is used to cut the data set into 128 × 128 image blocks, so as to obtain about 500,000 blocks for network training. In the batch training, 256 high-resolution image blocks were randomly selected as labels each time, and the bicubic kernel interpolation method was used for subsampling to obtain low-resolution image blocks, which were used as the input of the network. For the test set, experiments were carried out on five common benchmark datasets: SET5 [29], SET14 [30], BSD100 [31], URBAN100 [3] and MANGA109 [32]. The images in the first three datasets consist of natural scenes, the URBAN100 dataset contains challenging images of urban scenes as well as details of different frequency bands, and the MANGA109 dataset is the Japanese manga dataset.

B. EXPERIMENTAL SETUP
In training, the wavelet coefficient corresponding to the HR image is taken as the training standard, in which the wavelet decomposition order is obtained through the mapping relationship with the scaling coefficient. Then, formula (1) is used to calculate the loss based on the results of each iteration. At the same time, Adam's optimizer (parameter setting: β 1 = 0.9 and β 2 = 0.999) was used to update the weight and bias. The learning rate lr is initialized to 2e-4, and the learning rate attenuation factor is set to 0.005. The input of the network can be single-channel grayscale image or three-channel color image, and the latter is chosen for the training of the network in this paper.
As shown in Table 1, the feature extraction network consists of three residual blocks, in which the number of filters increases by two times from 64 to 256. Feature extraction network fusion features through 1 × 1 convolution to achieve dimensionality reduction, as the input of inference network. The inference network is also composed of three residual blocks. Contrary to the feature extraction part, the number of filters decreases by two times from 256 to 64, and finally the number of channels 3 × r 2 (r is the scaling factor) is obtained through a convolution.
At the same time, a single loss function is used for training at the beginning, namely λ = 0 and λ = 1, and the weighting is set through the training results to obtain the optimal parameter selection. Finally, the parameter is set to λ = 0.99.

C. LOSS FUNCTION ANALYSIS
In this paper, two widely used indicators of image quality are used to judge the quality of SR results: peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).
PSNR. Given a clean image I and a noisy image K of size m×n, mean square error (MSE) is defined as: So PSNR is defined as: In which MAX 2 I is the maximum possible pixel value of the image. If each pixel is represented by 8-bit binary, it is 255. In general, if pixel values are represented by B-bit binary, then MAX 2 I = 2B − 1. In general, for UINT8 data, the maximum pixel value is 255. For floating point data, the maximum pixel value is 1.
SSIM. SSIM is based on three comparative measures between samples x and y: luminance (l), contrast(c), and structure(s).
s (x, y) = = σ x,y + c 3 σ x σ y + c 3 (11) where µ x is the mean of x, µ y is the mean of y, σ 2 x is the mean of x, σ 2 y is the mean of y, σ xy is the covariance of x and y, c 1 = (k 1 L) 2 and c 2 = (k 2 L) 2 are two constants that keep the denominator from being zero and Generally take c3 = c2/2, L = 2B − 1 is the range of pixel values, k 1 = 0.01 and k 2 = 0.03 are the default values In this way, If α, β, γ are set as 1, we can obtain In order to prove the superiority of the loss function, a series of comparative experiments are done for different loss functions in this paper. Table 2 shows the results after training with the wavelet coefficient loss function, image space loss function and the combined loss function respectively (the scaling coefficient is x4, and the data sets SET 5 and SET 14). The experimental results show that the combination method produces the optimal effect regardless of the PSNR or SSIM, which further proves that the strategy of combining the two is feasible and effective.

D. CONTRAST EXPERIMENT
In this section, the proposed method is compared with other excellent methods. For the scaling scale, the model was tested with 2×, 4×, and 8×. Table 3 summarizes the comparison of the results (PSNR and SSIM) between our method and other methods in different data sets (SET 5, SET 14, BSD100, URBAN100 and MANGA109) and at different scales (×2, ×4 and ×8). The three best results are shown in bold, underlined, and shaded. It can be intuitively seen that the method proposed in this paper does not exceed EDSR [33] in results. The reason is that in terms of network structure, both of them use residual blocks as the basic components of the network. The difference is that the method proposed in this paper only uses 6 residuals, whereas EDSR [33] uses 32 residuals. Previous work has shown that deep networks make it possible to predict target pixel information based on more pixels, or larger areas, and perform better in super-resolution reconstruction. But in other aspects, the method proposed in this paper also has certain advantages, such as faster training speed and running speed, and low hardware requirements. Compared with other methods, the experimental results of the proposed method are optimal for scaling scales ×4 and ×8. Figure 4∼ Figure 9 shows the comparison of visual effects of sample images selected from the above data sets after reconstruction by different methods.  As shown in Figure 4 and Figure 7, the image lines generated by the method in this paper are not only not distorted, but also clearer and sharper. However, in other results, the edges of the lines are blurred, and some even have the phenomenon of double shadow.
As shown in Figure 5, the result produced by the method in this paper retains the edge information of letters well and looks more complete and standardized visually. As can be seen from Figure 6, the result of this paper retains the stone gap between the two bridge cavities well, while the results of other methods do not reflect this detail well.
As shown in Figure 8, the results of this paper restore the lines on the zebra more clearly, while other results are more ambiguous.
In Figure 9, it can be clearly seen that the result of the method in this paper is a straight and clear restoration of the VOLUME 9, 2021 FIGURE 6. Super-resolution result illustration on BSD100 with scaling factor ×4.   white lines on the sidewalk and a sharper edge, and other results are blurry, with jagged edges.
In general, it is clear from the example figure that the method used in this paper retains details better than other methods except EDSR [33], regardless of the scaling factor of x4 or x8, while making the edges sharper and clearer and having a better reconstruction effect. Compared with SRGAN. For SR images generated by adversarial network, the effect is not good from the perspective of PSNR and SSIM. However, from the visual effect shown in Figure 10, the reconstructed SR image is more realistic and sharpened. However, as shown in Figure 11, when the details are zoomed in, the resulting image of SRGAN has a few more artifacts, resulting in a lot of strange patterns. This result is not conducive to the subsequent highlevel computer vision tasks.

E. BOUNDEDNESS ANALYSIS
It can be seen from the results that the effect of the proposed network will be greatly improved when the scaling coefficients are x4 and x8. It also works well for small scaling multiples of x2. Nevertheless, it doesn't outperform other best practices. By analyzing the characteristics of the wavelet transform, it can be found that with the increase of the scaling factor, the level of the wavelet transform is also improved, and the detail information of the image is further refined, and the missing details of the image can be inferred better through the network, so as to achieve a better effect. The experimental results further show that the wavelet decomposition method is more suitable for larger scaling coefficient. At the same time, considering the mapping relationship between the scaling factor and the number of the final output wavelet coefficients, the method in this paper can only deal with the specific scaling factor of 2 n , and cannot deal with the situation of x3 used in the general method.

V. CONCLUSION
Traditional neural network methods usually reconstruct super resolution images in spatial domain, but these methods often ignore important details in the reconstruction process. In view of the fact that wavelet transform can separate the rough and detailed features of image content, we propose an image super resolution algorithm combining deep learning and wavelet transform. Different from other traditional convolutional neural networks to directly derive highresolution images, this method adopts a multi-stage learning strategy. First, the wavelet coefficients of high-resolution images are inferred, and then the super-resolution images are reconstructed. In order to obtain more information, a flexible and scalable deep neural network with residuals nested within residuals is adopted. In addition, the proposed neural network model is optimized by combining the loss function of image space domain and wavelet domain. The proposed method is tested on SET5, SET14, BSD100, AND URBAN100 datasets, and the experimental results show that the visual effect and PSNR of the proposed method are superior to those of the related image super-resolution methods. In the future, we will continue to explore different image reconstruction methods, not only focusing on the traditional and classical statistical methods, that is, combining the spatial domain of the image with different statistical methods, but also exploring ways such as replacing CNN with other deep neural networks, or replacing the residuals blocks in this paper with dense blocks.
HUI YANG was born in Changsha, China, in 1983. He received the master's degree in software engineering from Central South University, in 2012.
He was appointed as an Associate Professor, in 2018. He presided over the teaching of multiple computer courses, studied the application of computer, information technology, and artificial intelligence. He has coauthored or published two monographs. He is presiding over application topics and published more than 20 articles. He is a domestic visiting scholar and a member of CCF.
YIBO WANG (Member, IEEE) was born in Sichuan, China, in 1982. He received the Ph.D. degree in high power microwave and discharge plasma from the National University of Defense Technology (NUDT), in 2012.
From 2012 to 2018, he was a Lecturer at NUDT. He joined the School of Electronic Information, Hunan Institute of Information Technology, in 2018, and was appointed as an Associate Professor, in 2019. He has coauthored or published three monographs and more than 30 articles. He is a member of the Chinese Society of Electronics. VOLUME 9, 2021