An image super-resolution reconstruction method based on PEGAN

In order to improve the accuracy and efficiency of super-resolution reconstruction of low-resolution images, a multi-scale and multi-stage self similar fusion image super-resolution reconstruction algorithm is proposed: firstly, the low-frequency features of the image are obtained by using the feature extraction network and used as the input of two sub networks, one of which obtains the structural feature information of low-resolution images through the coding network, Secondly, the high-frequency features are obtained through the multi-path feedforward network composed of stage feature fusion unit, in which the fusion unit fuses the features of several consecutive layers of the network and obtains effective features in an adaptive way; Finally, the residual block and the batch normalization layer unfavorable to the image super-resolution in the discriminator are removed. Spectral normalization is used in the generator and discriminator to reduce the computational overhead and stabilize the model training. The experimental results show that Our method makes the average PSNR of the reconstructed images on the two test data sets of Remo-A dataset and Remo-B dataset reach 27.95 dB and the average SSIM reach 0.771 without losing too much speed. This model draws lessons from the connection mode of dense network to strengthen the connection between network layers and connect the whole network through multipath connection, so as to make full use of the characteristics of hierarchical network, extract more high-frequency information and improve the quality of reconstruction. The texture of the reconstructed results is more real, the brightness is more accurate and more in line with the evaluation of human visual senses, which shows the effectiveness and superiority of the algorithm. It not only performs faster in reconstruction speed, but also improves the quality of reconstructed image effectively.


I. INTRODUCTION
Image super-resolution reconstruction is a classical problem in the field of computer vision. Image super-resolution reconstruction aims to reconstruct high-resolution (HR) images with rich details by inputting one or more low resolution (LR) images. Therefore, image super-resolution reconstruction technology is widely used in medical images, satellite remote sensing, video surveillance and other fields. However, for any low resolution image, it corresponds to countless high resolution images, so the problem of image super-resolution reconstruction is an ill posed problem. In order to solve this problem, people have proposed interpolation based methods [1][2], reconstruction based methods [3][4] and learning based methods [5][6][7][8][9][10]. Like.
High resolution images have higher pixel density, can provide more details such as hue, shape and texture, and bring better visual experience. According to different input information, the existing SR can be divided into two categories: reconstructing a high-resolution image from multiple low-resolution images (multi frame image superresolution) and reconstructing a high-resolution image from a single low-resolution image (single frame image superresolution). Single image super resolution (SISR) uses a single image for super-resolution reconstruction, which overcomes the problems of difficult image sequence and insufficient timing [11]. Due to the lack of correlation information between multi frame images, it is difficult to obtain the prior information of image degradation, which This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3142049, IEEE Access Chang-Wei Jing: An image super-resolution reconstruction method based on PEGAN VOLUME XX, 2017 1 has become the difficulty of image super-resolution reconstruction. With the increasing demand for video and display quality, the image resolution of video, as an important feature of video quality, has gradually transitioned to the 4K level. In order to meet the higher and higher image resolution, the resolution of terminal display equipment has gradually increased [12]. However, due to the low resolution and low resolution (LR) of old video sources Video source can not get better display effect on high resolution (HR) display equipment. Therefore, super resolution technology of video source conversion from LR to HR has been widely concerned In the learning based method, the idea of completing the reconstruction task by training the neural network has attracted extensive attention with the proposal of SRNN. SRNN stacks three convolution layers to learn the nonlinear mapping between high and low resolution image pairs, so as to complete the super-resolution task. However, due to the small number of network layers of SRNN, only shallow information can be extracted, and the reconstructed image is blurred. Based on SRNN, Wang et al. extended the network depth to 20 layers, extracted deep features [13], and took the high and low resolution image residuals as the learning goal to reduce the amount of model operation. Kim et al. proposed that DRCN (deep recursive revolutionary network) uses recursive network to process data circularly, and adds jump connection in the network layer to alleviate the phenomenon of gradient disappearance and explosion, and make full use of low resolution information [14]. Tong et al. used dense blocks to alleviate the problems such as network gradient disappearance, support feature reuse, reduce parameters and reduce the difficulty of training [15]. These methods proved the effectiveness of deepening the number of network layers, learning residual images and inter layer jump connections in reconstructing various types of images, including face reconstruction tasks, but because faces are a class For objects with regular structure, the best effect can not be obtained by using ordinary super-resolution algorithm. For face images, Yu et al. Proposed UR-DGN introduced the generation countermeasure network into the face superresolution process for the first time, and used the approximately aligned frontal face image training model to solve the problem of artifacts in the reconstruction results due to the low input resolution and difficulty in locating facial features [16]. However, due to the single picture category in the data set, the side and other poses are reconstructed There is a large error in the face image. Zhu et al. reconstructed the face image with clear contour by alternately optimizing the face reconstruction task and face intensive field estimation [17]. However, due to the non end-to-end network structure, the learning process is more complex. Song et al The input image is divided into five different regions according to the facial features, the reconstructed image is generated through the corresponding convolution neural network, and the region enhancement method is used to generate a complete reconstructed image with rich facial features details [18]. However, due to the reason of stitching, the reconstruction effect is uneven and will show obvious unevenness Continuous region. Li et al. Proposed GFRN, which takes the image to be reconstructed belonging to the same person and another high-resolution guide image as common input, estimates the dense flow field to align the guide image with the image to be reconstructed, and uses the aligned image to complete the reconstruction [19]. The reconstruction results can well retain the character features, but this method requires the high-definition image of the same person to assist the reconstruction The condition is difficult to meet in practical application.
With the proposal of generative adversarial networks (GAN), people began to try to use the generated adversary network to process various computer vision tasks [20]. In 2016, Liu et al first used the generated adversary network for image super-resolution reconstruction (SRGAN), which improves the visual effect of the generated image [21]. In 2018, Xin et al. removed all BN layers in the generator and proposed to use residual dense blocks instead of the original basic blocks to train a very deep network, which further improves the visual quality [22]. Inspired by this, this paper uses support vector machine (SVM) based on SRGAN model Taking the hinge loss in as the objective function, a more stable and noise resistant charbonier loss function is used in the generation network to replace the L2 loss, which improves the problem that the reconstructed image will produce speckle artifacts in the ordinary region. At the same time, the residual block and the BN layer that will normalize the features in the discriminator are removed, and spectral normalization is used in the generator and discriminator to reduce the computational overhead and stabilize the model training [23]. In the selection of activation function, the exponential linear unit ELU activation function with less complexity and better robustness to noise is used in the discriminator to replace the leakyrelu activation function. The above methods further improve the visual effect of the image and make it closer to human visual perception. In order to obtain better super-resolution reconstruction effect, this paper designs PEGAN super-resolution reconstruction method, which can reconstruct four times of down sampled images. PEGAN optimizes the activation function, basic network structure and loss function based on SRGAN algorithm framework and to improve the superresolution reconstruction algorithm. For face images: endto-end based on depth residual module A double-layer neural network is constructed to extract the high-frequency features of the image layer by layer to complete the reconstruction; for the unique structure of the face, a face information estimation module is added to the reconstruction network to assist the reconstruction, constrain the reconstruction results from the geometric level, and effectively deal with the reconstruction task of multi pose face; the mixed amplification factor is used to train the model and improve the generalization ability of the model. This paper proposes a super-resolution reconstruction method based on multi-stage feature fusion network, which is mainly used to solve the shortcomings of many classical reconstruction network structures, that is, to improve the reconstruction effect by changing the network structure. On the one hand, this model draws lessons from the connection mode of dense network to strengthen the connection between the network layer and the multi-path connection through the whole network, so as to make full use of the characteristics of hierarchical network, extract more highfrequency information and improve the reconstruction quality.

II. Related work A. Generate countermeasure network
Inspired by the two person zero sum game, in 2019, Ian goodflow proposed the concept of generating confrontation network, which consists of a special confrontation process in which two neural networks compete with each other. The specific structure is shown in Figure 1. Among them, we are deeply inspired by the representative GAN model [24]. The generating network G(z) captures the potential distribution of real sample data, generates new sample data, and discriminates the network D(x) It is a two classifier that attempts to distinguish between real data and false data created by the generation network [25], that is, to judge whether the input is real data or generated samples. The discrimination network will generate a scalar in the range, representing the probability that the data is real data. The training of generating countermeasure network is actually a minimax game process, and the optimization goal is to achieve Nash equilibrium [26], that is, there is no difference between the sample generated by the generator and the real sample, and the discriminator cannot accurately judge the generated data and the real data. At this time, the probability that the discriminator thinks that the result output by the generator is the real data is 0.5. The confrontation process between the generator and the discriminator is shown in formula (1): Where, V (D, g) represents the objective function of Gan optimization, X represents the data obtained from the real data, D (x) represents the probability that the real data is judged to be true through the discriminator, Z is the random noise signal, and the generator g (z) receives the input z from the probability distribution P (z) and inputs it to the discriminator network D(x). The discriminator part is shown in formula (2): Among them, for the fixed generator, for the real sample x, that is, the former term of the formula, the larger the desired result D(x), the better, because the closer the result of the real sample is to 1, the better. For false samples, the smaller the desired result (DCGCZ), that is, the larger the result of 1, the better, because its label is the o} generator part, as shown in formula ( (3): In the optimization process, there is no real sample. We hope that the larger the result with the label D(G), the better, that is, the smaller the result of g(z). The combined generation countermeasure network combines two Gan together. Each Gan is for an image domain. In essence, the image distribution P (x) learned by the Gan should be close to the training sample distribution P(x). Then any input noise into the trained generator can generate an image that is enough like the training sample.
In the traditional domain adaptation, we need to learn or ill practice a domain adapter, and the domain adapter needs to use the source domain and the corresponding target domain Therefore, by adding unsupervised weight sharing constraints to the network and solving the inner product distribution solution of the boundary distribution, COGAN can realize unsupervised learning a joint distribution when there are no corresponding images in the two domains, such as learning the joint distribution of two different attributes of color and depth of the picture.

B. Dense convolution network
In deep learning networks, the problems of gradient disappearance and gradient dispersion will become more and more serious with the increase of the number of network layers. [27] Resnetworks proposed in highway networks proposed improved networks for the above problems [28]. Although the above algorithms are different in network structure and training process, their key point is to create a short path from the early feature layer to the later feature layer. In order to ensure the maximum transmission of information flow between different layers, Huang et al.
Put forward dense convolutional networks [29]. The basic idea of the model is that each layer in the network should obtain additional feature input from all its feedforward layers, and transmit its own feature map to all subsequent layers for effective training. Assuming that the model has layer B, there will be connections in densenet instead of the B connection of traditional neural network. Densenet creates a deeper and more effective convolution network, and its dense connection mechanism is shown in Figure 2. The performance of the method is better than that of the standard WGAN, can stably train the Gan architecture for image generation and language model, hardly need the adjustment of super parameters, and can generate samples with higher quality than the standard WGAN at a faster convergence speed. WGAN-GP does not essentially change the structure of Gan model, but improves the description of objective function and the selection of optimization method.

D.loss function
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1)DISCRIMINATOR LOSS FUNCTION
In 2018, Jolicoeur proposed that the probability that the pseudo data is real data should not only be improved, but also the probability that the actual data is real data should be reduced [14]. The loss function of the discriminator is shown in equation (7): Where ex and ex represent the average in the real highresolution batch and the average in the generated highresolution batch respectively, and 6 is the sigmoid function。

2)GENERATOR MODEL LOSS FUNCTION
Johnson et al. proposed vector I loss and extended it in SRGAN [15]. The loss of perception was previously defined in the activation layer of the pre trained deep network, that is, the distance between the two activation functions was minimized. In addition, PESRGAN introduces the penetration index PI, that is, the variance ratio of the original image and the Laplace operator of the image. The larger the penetration index, the higher the definition of the representative image and the more details. Using the penetration index as the weight is conducive to the generator model to generate clearer images. However, if the pI value is too large, too many pseudo details will be generated. PI formula is shown in formula (8): PeSRGAN redefines the loss function in combination with the relative discriminator theory. The reason for using the features before the activation layer is that the activated features are very sparse. For example, the percentage of activated neurons in baboon is only 11.17%. after VGG19-54 layer, indicating that the weak supervision provided by sparse activation of advanced features will degrade the performance, Using the activated features will also make the brightness of the image different from the real image; The advantage of fusing two-layer features is that the twolayer features can better clarify the direction of convergence with low loss. The loss function of generator model is defined as equation (5) Is the total generation loss of the generator, l is the 1-norm distance content loss between the evaluated image g (x;) and the real image y, brother and knife are the coefficients to balance different loss terms, lrag is the loss relative to the generator, and P1 is the permeability index parameter.

3)DISCRIMINANT NETWORK：
The discrimination network of PESRGAN algorithm is based on the classical VGG19 network, which can be simplified into two modules: feature extraction and linear classification. The feature extraction module includes 16 convolution layers. After convolution of each layer, leakyrelu is used as the activation function. In order to avoid gradient disappearance and enhance the stability of the model, BN layer is used after convolution of each layer except the first layer. The discriminant network needs to judge the authenticity of the input sample image, that is, the problem of image classification. In order to avoid the slow training speed of network model and the risk of over fitting, this paper uses global average pooling (GAP) to replace the full connection layer used in most image classification models, calculates the pixel average value of each layer of feature map finally obtained by the feature extraction module, and then linearly fuses all the values into the sigmoid activation function, Finally, the network output discriminator classifies the input sample image. The training of discriminant network is helpful to generate the network and reconstruct the results closer to the real highresolution image.

III. method A. Reconstruction algorithm based on anchored neighborhood regression
In the SCDL algorithm, after learning the high-resolution and low-resolution dictionary pairs and the mapping matrix W, the sparse representation coefficient can be calculated according to the input image, and then the image superresolution reconstruction can be realized by the linear combination of the high-resolution dictionary DH and the corresponding sparse representation coefficient ah. In the reconstruction process, the objective function for calculating the sparse representation coefficient is: Obviously, equation (8)  It can be seen that as long as the low resolution dictionary is obtained by off-line learning, the sparse coefficients can be calculated according to the input low resolution image block characteristics, and then the image super-resolution reconstruction is carried out by equation (13), which improves the reconstruction speed. However, in the anr algorithm, the sparse representation coefficients of high and low resolution are regarded as exactly the same, Considering the difference between the two sparse representation coefficients, this paper improves the anr algorithm and uses the mapping matrix w obtained in the previous section to calculate the high-resolution sparse representation coefficient AR, so as to improve the accuracy of sparse representation and the quality of reconstructed image: The high-resolution image block can be obtained by linearly combining the result obtained by equation (14) with the high-resolution dictionary: It is called projection matrix Since the projection matrix in equation (10) is obtained based on all the features of the whole training set without using the features of the input image, this method can only obtain a global projection matrix, resulting in the imprecision of the projection matrix [ [14] Therefore, considering the characteristics of the input image, for the trained dictionary, multiply the input image block features with each dictionary atom, and the atom with the largest product is regarded as the most relevant and then select the k nearest neighbors of atom J in the highresolution and low-resolution dictionaries respectively. Pj is used for reconstruction to obtain high-resolution image block; Then, the high-resolution image block x is combined and the overlapping area of the image block is averaged to obtain the high-frequency part of the high-resolution image; Finally, it is added with the interpolated image to obtain the final reconstructed high-resolution image.

B. Perceived loss
Different from SRGAN, this paper adds charbonier loss function and TV regularity term to the perceived loss. LR+bicubic YES Charbonnier In order to ensure the correctness of the low-frequency part of the image constructed by the generator, charbonier loss is introduced. As shown in Table 1, most of the SR methods based on convolutional neural network optimize the network with L2 loss. L2 loss can directly optimize the P SNR value, but it will inevitably produce fuzzy prediction. In contrast, this paper uses a robust charbonier loss function instead of L2 loss to optimize the depth network to deal with outliers and improve the reconstruction accuracy. It not only has less training time, but also has a higher PSNR value of the reconstruction result. The mathematical expression of Charbonnier loss function is shown in formula (17)

C. Double layer cascade convolutional neural network
In this paper, a super-resolution reconstruction method based on double-layer cascade neural network is proposed, which consists of two parts: a priori recovery network and structure constraint network. The network flow is shown in Figure 3. Given that the low resolution input image is LR, after bicubic interpolation and amplification, it is the same size as the corresponding high resolution image 1 hr (128 x 128). The input image LR first obtains the preliminarily reconstructed image through the first layer a priori recovery network: G I == (19) In the training stage, the reconstructed image ISR and the real high-resolution image IHR are used as the input of the discriminator. The discriminator randomly selects an image to distinguish its authenticity and outputs the discrimination result. Fig. 3 structure diagram of double-layer cascade neural network A priori restoration network: in order to extract deep features from low resolution input, reconstruct highfrequency images with rich information to avoid gradient disappearance caused by too deep network. In this paper, the basic structure of a priori recovery network is designed by residual block. The residual block contains two blocks with a size of 3 × 3, each layer is connected with a batch processing layer, and prelu is used as the activation function. Because it is difficult to extract accurate face structure information from the input image when the resolution of the input image is too low, the network first extracts the deep a priori features of the image from the input low resolution face image. The networking structure is shown in figure 1-A. The low resolution input first passes through a size of 3 × Convolution kernel of 3. In order to reduce the operation, set the moving step to 2 and the size of the feature map to half of the input. Then, the 20 layer residual block extracts the features of the image, enlarges the image from the deconvolution layer to the initial size, and uses the convolution layer to reconstruct the image. Structural constraint network: by constraining the MSE loss and perceptual loss of the first layer network, a good reconstructed image with rich high-frequency details can be obtained. However, since the facial structure information of the face is not added to the facial reconstruction process, there may be obvious errors in the facial features details of the reconstructed image when reconstructing some images with rich expressions or diverse postures. After decoding, the two-layer deconvolution layer restores the features to the initial size, and finally the reconstruction is completed by a convolution layer. Through relay supervision of the key point heat map, the facial consistency between the reconstructed image and the real image is strengthened. In addition, during the reconstruction process, the Hg module can also provide more high-frequency information [27], which helps to improve the reconstruction effect. According to the characteristics of the diversity of the scale of the optical remote sensing images of the adaptive multi-scale superresolution reconstruction, to extract the multi-scale information of optical remote sensing image, reduce some information is missing, but the reconstruction model without considering the visual task the specific needs of target detection, the image is reconstructed and cannot be tested for good, Moreover, both the target detection task and the hyperpartite reconstruction model are optimized independently. We know that the target detection result of optical remote sensing image largely depends on the image clarity and enough texture information to extract specific feature information.

D. Multiscale concatenation framework and hierarchical search matching algorithm
The image texture is complex and diverse, and is affected by noise. Wavelet analysis has the characteristics of high and low frequency signal separation and processing. It can filter the image noise and reconstruct the original signal without losing the original important information. In addition, wavelet analysis can analyze the multi-resolution characteristics of the original signal, reasonably allocate the calculation cost, and take into account the differences of signal characteristics at different scales. According to the document "20", the effect of multiple super division reconstruction is better than that of single super division reconstruction. Therefore, a multi-scale series framework is proposed in this paper, as shown in Fig. 5.

A. Experimental setup
This section describes the basic setup of the experiment, explores the influence of different components used in this paper in the simplified test, analyzes the selection of K value in the multi-path feedforward structure. Inspired by references [17,24], this paper performs data enhancement on the training set, rotates each image of the training set by 90 °, 180 °, and 270 °, and flips it horizontally. Because the human eye is sensitive to the brightness channel, this paper only deals with the brightness channel y, and the chroma channels CB and Cr are amplified by interpolation. On the other hand, as in reference [15][16][17], the training set in this paper contains image blocks of different sizes(×2， × 3 and × 4) , only a single model needs to be trained for super-resolution reconstruction at different scales.

B. Network effectiveness experiment
Considering that it is difficult to estimate the face structure directly from the low resolution image, this paper takes the a priori recovery network as the first layer to reconstruct the image, and then improves the face reconstruction effect through the structure constraint network of the second layer. Therefore, in order to verify the effectiveness of structural constraint network in face reconstruction task and the rationality of the overall structure of the network, Net_1 and Net 2 are built respectively for verification. Net_1 removes the second layer structure constraint network from the original network and directly acts on the output of the first layer network. Net 2 switches the position of the twolayer network, and the rest remains unchanged. Net 1 and Net 2 are trained using the same data sets and parameters as the methods in this paper.  Figure 6 shows the reconstruction effects of net 1 and the method in this paper under 2, 3 and 4 magnification factors respectively. The representative areas in the figure are marked with red boxes and enlarged in the lower right corner. It can be seen that compared with net 1 without structural constraint network, this method has clearer facial contour, more accurate and sharp performance of facial features, and restores more high-frequency details (such as hair, jewelry and background). In addition, this paper can correctly restore the facial features, and the reconstructed images of net 1 have different degrees of errors when the magnification is high. Fig. 7 Comparison of network layer sequence on PSNR value of reconstruction results Figure 7 compares the peak signal to noise ratio (PSNR) of net 2 and the method in this paper under the celeba test set. It can be seen from the figure that under various magnification, the method in this paper has achieved better P SNR value. The higher the magnification, the greater the PSNR gap between this paper and net 2. Therefore, it is necessary to use a priori recovery network to initially reconstruct the input image, and then optimize it to achieve better results.

C. Comparative experiment
In this paper, the low resolution face image amplified by bicubic is reconstructed by using 2-4 magnification factor, and the reconstruction results are compared with bicubic method, SRGAN method and WGAN method. The P SNR value and image structure similarity index (SSIM) often used in image processing tasks are used for quantitative comparison. The results are shown in Table 2. It can be seen from the table that when the magnification factor of Remo-A dataset is 2, 3 and 4 times respectively, the PSNR value in this paper is increased by 1.90db, 2.34db and 2.58db respectively compared with other methods, and the SSIM value is increased by 0.0441, 0.0677 and 0.08270 respectively. It can be seen that although the PSNR and SSIM values in this paper are reduced to varying degrees with the increase of magnification, compared with other methods, This paper can recover more details at high magnification. On the Remo-B dataset, the average PSNR value increased by 1.24db and SSIM value increased by 0.040. Because celeba is selected as the training set, the numerical results of Helen test set improve less. Compared with srgan and srwgan, the model of this method has more lightweight characteristics and spends less training time. The following compares the above methods with the reconstruction effect of this paper from the visual effect. Figure 8 compares the Remo-A dataset with 2, 3 and 4 magnification factors respectively. It can be seen that under different magnification factors, this paper can reconstruct clear facial contour. Compared with other algorithms, it can also restore facial features closer to the original image and more exquisite texture details. Although the sharpness of the reconstructed image decreases gradually with the increase of the magnification factor. However, the higher the magnification, the greater the reconstruction effect, which is consistent with the quantitative index. Figure 6 compares the Remo-B dataset with a 3 magnification factor. Clear reconstructed images can still be obtained.

D. Results and analysis
PESRGAN experiments were conducted on the host R with Intel coretmi7-6800k @ 3.400ghz CPU, NVIDIA gtx1080ti GPU and 16GB memory. The batch size is 16, that is, 16 images are used for one gradient descent. Due to the limitation of GPU video memory used in the experiment, during the training process, the generator network input size is 48x48x3, the generator network output size is 192×192×3 high-resolution image through two sampling layers, the discriminator network input size is 192×192×3 high-resolution image, and the input is binary classification. Finally, in the test process, the input size is mxnx3, and the output size is (4m) × (4N) × 3, That is, a high-resolution image with quadruple r sampling.
Firstly, the experiment combines leakyrelu and swish activation functions, modifies the basic network structure of generator in ESRGAN algorithm, and compares the generator loss of this experiment with the characteristic loss of ESRGAN generator, as shown in Figure 10 below (using pre training model). Compared with the representative ersgan, the model training time in this paper has obvious advantages in training time.  Figure 10 that the generator loss convergence speed of PESRGAN is slightly faster than that of ESRGAN. Although the swish function is more complex than leakyrelu, due to the improvement of the generator basic network RRDB, a large number of parameters are reduced, which improves the speed of the algorithm and introduces the swish activation function, making the model nonlinear, smooth and non monotonic, Finally, the loss of the improved algorithm is slightly lower than that of ESRGAN, which shows that the improvement of activation function and network structure is effective. In order to further verify the model, the ability field of the model to reconstruct high-resolution flow is statistically checked. Figure 11 shows the probability density function of reconstructed velocity component and pressure. All reconstruction results show that the reconstructed indexes are consistent with the results obtained from the original image, which shows that the reconstruction ability of the model is effective.

VI. Conclusion
This paper proposes a super-resolution reconstruction method based on multi-stage feature fusion network, which is mainly used to solve the disadvantages of many classical reconstruction network structures, that is, to improve the reconstruction effect by changing the network structure. On the one hand, this model draws lessons from the connection mode of dense network, strengthens the connection between network layers through the multi-path connection of the whole network, so as to make full use of the layered characteristics of the network, so as to extract more highfrequency information and improve the reconstruction quality. Experimental results show that compared with other methods, this method has certain advantages in PSNR and structural similarity, and the reconstructed image has better visual effect. However, this method also has shortcomings. The existing idea of multi-stage fusion network is relatively simple. In the future work, we need to deeply study the network of feature reuse and the construction method of model. In the future, the network and model construction methods of feature reuse need to be further studied. In the follow-up work, we can adopt the idea of recursive learning to further optimize the model by reducing network parameters and increasing training samples, and explore and improve from the perspective of further improving the use and fusion of hierarchical features. Compared with other methods, the reconstructed image has better visual effect, higher PSNR and SSIM values and faster test time, which proves the effectiveness of the proposed method. Future research will focus on how to reduce training parameters, reduce network structure and obtain better reconstruction quality.