Structured Fusion Attention Network for Image Super-Resolution Reconstruction

To improve the extraction ability of image features, reduce the complexity of model parameters, and enhance the reconstruction effect of image super-resolution (SR), a structured fusion attention network (SFAN) is proposed. Firstly, the deep convolution method is used to extract shallow features from low-resolution images, and different residual attention modules are considered to improve the structured residual of the encoder to extract more image features. Secondly, the features output by the encoder are refined, and the spatial attention module and the channel attention module are reorganized according to an improved fusion attention method to provide better input features for PixelShuffle, thus achieving the effect of reconstructing the decoder. Finally, through adding low-frequency inputs and network predictions, the input image is directly interpolated into the target, thereby accelerating the convergence of the network’s high-frequency residual and improving the image reconstruction effect. Under the condition of reconstruction magnification of <inline-formula> <tex-math notation="LaTeX">$\times 2$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\times 3$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\times 4$ </tex-math></inline-formula>, SFAN is compared with some of the most advanced SR networks in the public data sets of Set5, Set14, BSD100, Urban100 and Manga 109. The experimental results show that SFAN has the best PSNR and SSIM values with low model parameters, thus proving that SFAN can achieve a good balance between the performance of SR and the complexity of parameters.


I. INTRODUCTION
With the widespread application of pipeline systems, the problems of pipeline inspection and maintenance have become increasingly prominent [1]. Due to congenital deficiencies such as low light and weak transmission signal in the pipeline, the resolution of the captured image is low, especially the fringe characteristics of the image. At present, the image super-resolution (SR) [2] technology is mainly used to improve the resolution of the image. SR is mainly based on interpolation method [3] (nearest interpolation, bilinear interpolation and bicubic interpolation), reconstruction method [4] (iterative backprojection, projection onto convex sets and maxaposterior) and learning method [5] (manifold learning, sparse coding and deep learning). Among them, although the interpolation method is simple and easy to understand, it does not consider the image degradation model, which will greatly reduce the degree of image restoration. Although the reconstruction method considers the image The associate editor coordinating the review of this manuscript and approving it for publication was Gulistan Raja . degradation model, it needs to assume that the high-resolution image has been appropriately transformed, and this transformation has a certain degree of uncertainty. Based on the learning method, a large amount of training data is used to obtain a corresponding relationship between low-resolution images and high-resolution images, and the effect of image SR rate is greatly improved by learning the mapping relationship. However, the manifold learning method and the sparse coding method pay more attention to learning and optimizing the dictionary, and less considering the unified optimization framework. The deep learning network directly learns the mapping relationship between high and low resolutions through an end-to-end method. As Dong et al. [6] applied deep learning to the field of image SR reconstruction, improved SR methods based on deep learning have been continuously proposed. The process of improving image SR based on deep learning mainly includes the following steps: feature extraction, encoder design, decoder design, loss function selection, and model training and validation. Among them, the design of encoder and decoder is the core of improving image SR method [7]. Dong et al. designed a fast SR convolutional neural network (FSRCNN) [8] based on the SR convolutional neural network (SRCNN) [7], FSRCNN redefined the mapping layer by introducing direct learning of the deconvolution layer, and adopted a smaller filter size to obtain a better restoration quality. According to the principle of VGGnet, Kim et al. [9] proposed a high accurate single-image SR (VDSR) method through a very deep convolutional network. Compared with SRCNN, VDSR achieved better recovery quality, and the comparison results showed that increasing the network depth can significantly improve the image accuracy. As the network depth increases, the extracted image features become richer, but the number of input/output channels, the size of the convolution kernel, and the number of features will all increase. The increase of these parameters will cause the gradient to scatter or explode, and reduce the calculation speed of the network. In order to solve this problem, He et al. [10] proposed the concept of residual neural network (ResNet), which is a major breakthrough in the history of deep networks. ResNet solves the degradation problem of deep networks through residual learning. ResNet trains deeper networks through residual learning and solves the degradation problem of deep networks. Subsequently, based on the design and combination of residual networks, the researchers optimized the encoder and decoder to improve the reconstruction effect of image SR.
Yang et al. [11] improved the Dirac residual block to achieve global skip connections through sub-pixel convolution, thereby balancing the control between convolution and skip connections. Luo et al. [12] utilized the lattice block (LB) to optimize the residual block (RB), which reduced the model parameters by nearly half while maintaining SR performance. Wang et al. [13] proposed an adaptive weighted super-resolution network (AWSRN), which reduced the computational complexity of the SR model through effective residual learning. Li et al. [14] proposed an image super-resolution feedback network (SRFBN) to refine the feedback mechanism between low-frequency information and high-frequency information and improve the early reconstruction ability of images. Liu et al. [15] proposed the residual feature distillation network (RFDN), which used multiple feature distillation connections for residual learning to ensure the lightweight of the network. Anwar et al. [16] built a densely residual Laplacian network (DRLN) by cascading residuals on the residual structure to focus on learning high and intermediate-level features. Li et al. [17] proposed a deep interleaved network (DIN) to splice and integrate low-level features and high-level features to improve the utilization of features. Ledig et al. [18] proposed a generative adversarial network for image super-resolution (SRGAN), which utilized a deep residual network to recover the texture of images from heavily down-sampled images. From references [11]- [18] that the key to improving the effect of image SR reconstruction is to achieve a good trade-off between SR performance and model parameter complexity. However, as the complexity of the network structure increases, the dimensionality of the feature map continues to increase, that is, the number of channels representing the feature information of the entire image is also increasing. In order to improve the network's processing of more image feature information, a network structure based on the attention mechanism [19] was proposed according to the reciprocity between feature channels. The introduction of the attention mechanism enabled the network to select the focal position and generate more discriminative features. The attention mechanism can effectively improve the convergence characteristics of the network.
The attention mechanism is mainly divided into spatial domain, channel domain and mixed domain [19]. Among them, the spatial domain solves the problem that the key information cannot be identified due to the direct merging of the pooling layer in the convolutional neural network (CNN). The channel domain improves the collection weight of feature information. However, the spatial domain attention ignores the information of the channel domain, and the channel domain attention ignores the local information of each channel. Therefore, the mixed domain attention mechanism model combining the advantages of the two has become a key research direction. Zhang et al. [20] proposed RCAN, which utilized a channel attention mechanism to rescale the features of each channel. RCAN identically maps low-frequency information to the back-end of the decoder through LSC (Long Skip Connection), thereby ensuring the flow of information and accelerating the training of the network. Yang et al. [21] constructed an improved residual selfencoding attention mechanism network (RSAMSR) through multi-path convolution and adding attention mechanism modules. RSAMSR distinguished high-frequency components and low-frequency components to obtain fewer network parameters and better reconstruction effects. Chen et al. [22] proposed an image super-resolution reconstruction method based on attention mechanism and feature mapping to solve the problem that low-frequency components and highfrequency components are treated equally, this method used the feature map attention mechanism to promote the mapping and transfer of low-frequency components to highfrequency components, and used the mapping relationship to restore more details of the image. Haris et al. [23] used iterative up-and down-sampling layers to propose a deep back projection network (DBPN) to solve the interdependence between low-resolution images and high-resolution images. Zhang et al. [24] proposed a deformable and residual convolution network (DefRCN) for image SR to solve the inflexibility caused by the fixed geometric structure in the standard convolution filter. DefRCN used a deformable residual convolution block (DRCB) to enhance the sampling position of the space and reduce the computational cost. Liu et al. [25] proposed the residual attention network of multi-channel dense connection (MCRAN), which used an attention mechanism to adjust channel characteristics and increase the collection of characteristic information. Yang et al. [26] designed the channel attention and spatial graph convolutional network (CASGCN), CASGCN used multiple channel attention and spatial graph (CASG) blocks to explore the potential relationship between image features and strengthen the network's characterization ability. Mou et al. [27] proposed the CS-Net network. CS-Net uses the spatial attention module and the channel attention module to adaptively integrate local and global dependencies. Misra [28] proposed a triple attention mechanism, which used a three-branch structure to calculate attention weights, and achieve the lightweight effect of channel coding. Dai et al. [29] developed a secondorder attention network (SAN) to improve the characterization ability of the convolutional neural network (CNN), SAN used a novel trainable second-order channel attention (SOCA) module to adaptively scale the channel characteristics. Wang et al. [30] proposed a dual residual attention module (DRAM) network, which achieved restoration of high-frequency details in images by sharing information between the spatial and channel domains. Hu et al. [31] proposed a channel-wise and spatial feature modulation network (CSFM), which combined channel attention with feature attention into a residual block to modulate the representation of image features.
After encoding and decoding the feature information, the resolution of the feature map needs to be decoded and restored by upsampling. Vijay [32] constructed a deep fully convolutional neural network structure (SegNet), through the decoder up-sampling its lower resolution input feature map, SegNet eliminated the need for learning up-sampling and reduced the computational complexity of the network. Lai et al. [33] constructed the Laplacian pyramid super-resolution network (LapSRN) by transposing the convolutional up-sampling method. LapSRN reconstructed the sub-band residuals of high-resolution images, avoided the bicubic interpolation as the pre-processing process, and greatly reduced the computational complexity of the network. PixelShuffle is an up-sampling method designed specifically to solve the problem of image super-resolution. Shi et al. [34] used PixelShuffle to construct an efficient sub-pixel convolutional neural network (ESPCN) to super-parse low-resolution (LR) data into high resolution (HR) space, which greatly reduced the number of parameters of the network.
At present, the research focus of SR is mainly to extract richer feature information and restore better image details with fewer model parameters. According to references [20]- [31], although the attention mechanism adjusts the channel characteristics, the number of shallow channels and the number of high-level channels remain unchanged, which is inflexible for practical situations [20]. At the same time, the network mainly uses the concatenation method to connect the channel attention and the spatial attention in an endto-end manner in the decoding process, this method lacks the intercommunication of image feature information. Therefore, from the extraction of low-level features to high-level features, there is still room for further improvement in the research on the utilization of network width, the classification of features, and the reduction of parameter complexity.
In response to this research trend, this paper proposes the SFAN model.
In SFAN, we uses swish as the activation function, and introduces depthwise separable convolution to construct a residual block, which serves as the basic module of the attention residual group. Through structured processing of the attention residual group and the mixed domain attention mechanism, the number of channels of the network can be increased with the increase of the receptive field, and the network width can be fully utilized to increase the extraction of image feature information. Inspired by DRAM [30] and CSFM [31], SFAN inputs the image feature information obtained by the encoder into the spatial attention module and the channel attention module respectively, and a new attention fusion method (channel merge) is used to correspond the high-level information and shallow-level information of the image to each other. Finally, the upsampling characteristics of pixel reorganization are used to reorganize and strengthen the image features of the decoder. At the same time, inspired by RCAN [20] and DRSR [11], SFAN adopts the nearest neighbor interpolation upsampling method to globally connect the shallow feature information of the input image with the enhanced feature information output by the network decoder through LSC. Therefore, the parameter complexity of the network is reduced, and the convergence characteristics of the network are improved.
In summary, the main contribution of this paper are as follows: The extraction amount of image feature information is improved by structured methods, and the extracted feature information is refined by the attention residual module and the mixed domain attention module.
A new fusion method (channel merge) is designed to splicing the modules corresponding to the feature index (spatial attention module and channel attention module). The fusion attention method connects high-level feature information and shallow-level feature information in the model to improve the reconstruction effect of model super-resolution. The design of high-frequency residuals effectively reduces the complexity of model parameters. The comparison with the better network is carried out on five benchmark datasets to verify that SFAN has a good image SR reconstruction effect while ensuring a low amount of model parameters.
The paper is organized as follows: Section II is devoted to describing the composition architecture and implementation principle of SFAN. Section III conducts comparative experiments, including experimental conditions, ablation experiments and comparison of experimental results. Finally, some conclusions of this study are given in Section IV.

A. THE OVERALL STRUCTURE OF THE MODEL
The SFAN designed in this paper mainly consists of three parts: an encoder module, a decoder module and a highfrequency residual module.
The shallow features of the image are extracted from the input image using the depthwise separable convolution [35]. The encoder module is composed of different structured residual attention blocks and mixed-domain attention modules. Based on the residual theory, the output obtained by the encoder block is combined with the shallow features for reconstruction and decoding.
The decoder module is mainly based on the attention mechanism. The refined image feature information is evenly divided into spatial attention module and channel attention module for feature enhancement. After merging the two attention modules using the channel merging method, the decoder is reconstructed based on the PixelShuffle method. The number of decoders is determined by the multiple of upsampling.
The high-frequency residual module mainly adopts the global skip connection method. Through the nearest upsample module, the enhanced output features are superimposed with the shallow feature information of the original input image, that is, the entire SR network forms an overall residual structure, and the network only needs to learn the highfrequency (detail) residual features, thereby speeding up the convergence of the network.
The overall structure of SFAN is shown in Figure 1, where, blocks1, blocks2, and blocks3 are three different residual attention blocks. Conv is the convolution module. CPRM is channel pixel restore module. W and H represent the width and height of the feature map. (C1, C2, C3) represents the number of channels.
The deep stem conv is a convolutional layer with a 3 × 3 kernel size, and its expression is as follows [36].
where, f dsc () is the deep convolutional layer, I LR is the lowresolution feature, and H 0 is the extracted feature map.

B. ENCODER BLOCKS
The receptive field of high-level features is large, and the extraction of high-frequency feature information requires more channels, which is also beneficial for the decoder to reconstruct more feature information [22]. The receptive field of shallow features is small, and there is no need to increase the number of channels to extract low-frequency feature information. Therefore, when extracting shallow feature information, the number of channels remains the usual 64 [26], and as the amount of feature information increases, the residual groups are stratified by multiplying the number of channels.
The attention residual group is composed of multiple residual blocks. The residual block in SFAN is improved on the basis of the standard residual block used by ResNets [10]. The structure of a single residual block is shown in Figure 2. The specific improvements include: (2) The activation function uses the Swish function (Eq. 2) [28] to replace the ReLU function [37], because the Swish function has no upper bound and the phenomenon of gradient envelopes will not appear [38].
(3) The commonly used intermediate convolution layers mainly use 3 × 3 conventional convolutions [11], [20], [39]. It can be seen from the Xception structure [3] that the depthwise separable convolution composed of depthwise (DW) and pointwise (PW) has the advantage of lower operation cost and lower number of parameter than traditional convolution. Therefore, the intermediate convolutional layer of the SFAN residual block adopts 3 × 3 depthwise separable convolution layer (DSCL).
(4) As the champion of the 2017 ILSVR competition, SENet's status in the field of attention mechanisms is beyond reproach [40]. As the core of the attention block, SENet mainly includes two parts, Squeeze and Excitation, and the structure of SENet is shown in Figure 3. Among them, Squeeze (Eq. 4) aggregates the feature maps on the spatial dimension H×W to generate channel descriptors. Excitation (Eq. 5) uses the channel's self-selection mechanism to activate the learning samples of each channel, and then reweights the features to obtain the output of SENet.
The three residual attention blocks (blocks1, blocks2, blocks3) are respectively composed of multiple residual blocks. Due to the small receptive field of shallow features, a single residual block in blocks1 does not set a 3 × 3 DSCL and SENet module layer. With the deepening of feature information extraction, the residual blocks in blocks2 and blocks3 contain 3 × 3 DSCL and SENet module layers. In order to reduce the parameter complexity of the SFAN model, after a lot of experiments, the number of residual blocks and the channel magnification in (blocks1, blocks2, blocks3) are set to (6,8,6) and (×1, ×2, ×4), respectively. CPRM is a mixed domain attention module. The specific architecture of this module is described in section C. The reconstruction magnification of CPRM in the encoder is 1, and PixelShuffle reconstruction is not performed, that is, CPRM only fuses and refines the feature information in the spatial attention module and the channel attention module to improve the feature information effect of image extraction.
where, β is a constant or a trainable parameter. As an activation function, the Sigmoid function is characterized by mapping variables between 0 and 1 [41]. β = 1, because Swish under β = 1 indicates that the gradient properties of the ReLU function are no longer advantageous [38]. z c is the c-th element characteristic of the input z, and s c is the c-th element characteristic that is activated.

C. DECODER MODULE
SFAN's decoder module is composed of CPRM and sub-pixel convolution modules, the decoder structure of SFAN is shown in Figure 4. In order to recover the high-level details of SR, the network is made more discriminative for different local regions, including more important and harder to reconstruct regions. SFAN designs channel attention module and spatial attention module by exploiting the interdependence between channels and the spatial location of features. Among them, the spatial attention module and the channel attention module use two different pools. For the feature map of W × H × C, the channel module gets the attention weight of 1 × 1×C, and the spatial module gets the attention weight of W × H ×1. In the SRCA decoding process, a 3 × 3 convolutional layer is first used to refine the image features to further enhance the image features. And then, the refined feature information is divided into two parts and input them into the spatial attention module and the channel attention module, respectively. After the feature information is enhanced by the spatial attention module and the channel attention module, the feature descriptors output by the two types of attention mechanisms are fused to adaptively adjust the feature representation. Finally, the features are upsampled based on PixelShuffle [35] in the subpixel convolution module.
As the most effective algorithm for upsampling, Pixelshuffle uses convolution to transform a (H × W) low-resolution input image into an (r H × r W ) high-resolution image (r is the upsampling factor) through a sub-pixel operation. When the upsampling multiple is n, PixelShuffle needs n 2 channels to reorganize the space around a pixel. Therefore, in order to reduce the structural complexity of the network and meet the characteristics of PixelShuffle [33], SFAN adopts the spatial attention module and the channel attention module in parallel.
Consider the effect of image magnification and native low resolution on PixelShuffle. Different from the head-to-tail connection of concatenation [33] (Figure 4a), the method of channel merging (Figure 4b) is to fuse the channels corresponding to the index of the feature to connect the highfrequency information and low-frequency information in the model. Thereby, the image features are restored to a greater extent and the accuracy of pixel reorganization is improved.

D. HIGH FREQUENCY RESIDUAL MODULE
Inspired by the DRSR [11] network, the method of global connection is used to perform nearest neighbor interpolation on the input image, and the input low-resolution image is superimposed with the enhanced image feature information. Since the input low-resolution image already contains lowfrequency information features, the high-frequency residual module can directly predict the high-resolution results through the upsampling method based on the shallow feature information, thereby greatly reducing the amount of model parameters.
The loss function is very important for the training of the network model, and a suitable loss function can effectively improve the network performance. At present, L1 loss function (Loss 1 ) and L2 loss function (Loss 2 ) are often used, but the norm of Loss 2 squares the error, that is, Loss 1 is more robust than Loss 2 [42]. Loss 1 is to minimize the sum of the absolute difference between the target value and the estimated value. Therefore, SFAN uses Loss 1 as the loss function, and its expression is: where, Y is the target value,Ȳ i is the estimated value.

III. EXPERIMENTAL RESULT AND ANALYSIS A. EXPERIMENTAL SETTINGS 1) EXPERIMENTAL ENVIRONMENT SETTING
This paper uses Python 2.7 as the software and hardware environment, the operating system platform is Ubuntu 20.04 and the proposed SFAN run on two NVIDIA 2080Ti GPUs.

2) DATASETS
This experiment uses the general high-quality image dataset (DIV2K) [22], [42], [43] for training and verification. After SFAN's training and verification of DIV2K is completed, according to the references [6], [22], [26], [29], [33], [34], [44], the five common public datasets (Set5, Set14, B100, Urban100 and Manga109) are used for model performance evaluation. The batch size is set to 64, the number of epochs is set to 500, and an epoch is iterated 200 times. The reconstruction scale is set as ×2. On this basis, the scale of ×3 and ×4 is applied to the reconstructed image for comparison with other mainstream SR algorithms.

3) EVALUATION INDEX
In order to facilitate comparison with other advanced networks, based on references [6], [22], [26], [29], [33], [34], [44], this paper selects the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as the qualitative evaluation index of the image. Among them, PSNR is defined by Mean Square Error (MSE). The higher the value of PSNR and SSIM, the better the reconstruction of the image. The equations of MSE [22], PSNR [22] and SSIM [22] are  as follows: where, x and y represent H×W monochrome images, respectively, MAX x is the maximum value of the image point color, If each sampling point has a B-bit linear pulse code modulation representation, then MAX x is (2 B − 1). X and Y represent two graphs, respectively. (µ X , µ Y ) and (σ X , σ Y ) represent the average and standard deviation of X and Y , respectively, σ XY is the covariance of X and Y , and C 1 and C 2 are positive constants to prevent the formula from dividing by zero.

B. ABLATION EXPERIMENT
SFAN contains three main components, including an encoder module, a decoder module, and a high-frequency VOLUME 10, 2022 residual module. To evaluate the relative impact of structured and fused attention on the effectiveness of SFAN components, ablation experiments are performed on SFAN. The same configuration is used for all ablation analyses. For convenience, according to the reference [11], only 800 training sets in the DIV2K database are trained.

1) ENCODER MODULE AND DECODER MODULE
The comparison contents of the ablation experiments include the 3 × 3 DSCL in the encoder residual block, parallelism and fusion in the decoder attention module. The ablation results are shown in Table 1. In Table 1, √ indicates that this component is adopted, and × indicates that this component   is not adopted. Among them, when the 3 × 3 depthwise separable convolutional layer is not used, the ordinary 3×3 convolutional layer is used instead. When parallel mode is not used, serial mode is used instead. When the fusion method is not used, the concatenate end-to-end method is used instead. For the ablation experiments of the encoder module and the decoder module, this paper shows the PSNR values and SSIM values of Set14 in 398 epochs, and the scale is ×2.
As can be seen from Table 1 that modules are added or improved in the encoder and decoder of SFAN, and these modules contribute to the feature extraction effect and the model parameter reduction effect. The two components would perform better than only one component, and SFAN uses three components to perform the best.

2) HIGH-FREQUENCY RESIDUAL MODULE
In order to verify the effectiveness of the high-frequency residual module in the model, a set of comparative experiments were designed for SRCA to compare the performance with and without a Nearest sample module. For the ablation experiments of the high-frequency residual module, this paper shows the PSNR values and SSIM values of the DIV2K training set in 598 epochs, and the scale is ×2. Figure 5 shows the accuracy change after training on the DIV2K dataset with and without the high-frequency residual module. It can be seen from Figure 5 that the network using the high-frequency residual module tends to be balanced when the epoch is 390, and the network that does not use the high-frequency residual module tends to be balanced when the epoch is 415. The results in Figure 5 prove that adding the Nearest upsample module can improve the convergence characteristics of the model.

C. EXPERIMENTAL RESULTS AND ANALYSIS
In order to verify the effectiveness of SFAN, the SFAN on the DIV2K validation set was quantitatively evaluated and compared with the recent models Bicubic [25], MCRAN+ [25]  and EDSR [39], as shown in Table 2. The results show that for the DIV2K dataset, under the condition that the scale is (×2, ×3, ×4), the PSNR value of SFAN is larger than MCRAN+, that is, the effect of SFAN on image superdivision reconstruction is excellent. After the network is trained on the DIV2K data set, the accuracy curve of SFAN is shown in Figure 6.
According to reference [25], [36], the parameters and performance of SFAN and some recent SR networks are compared, and the results are shown in Table 3. Among them, compared with EDSR [39], DBPN [45], RDN [46], MCRAN [25] and A 2 N [36], MemNet and A 2 N have fewer parameters, but their performance is relatively poor, while SFAN has the advantages of few parameters and high performance. SFAN has achieved a good balance between model parameters and performance.
It can be seen from Table 4 that SFAN's PSNR and SSIM are generally the best or better. With the increase of the scale, the PSNR/SSIM value of the SFAN model has increased continuously compared with the maximum value of other models, so SFAN has a better effect in SR.
In addition to comparing objective standards, the paper also conducts a subjective comparative experiment of visual effects. Referring to references [2], [5], [20] and [26], the effects of scale (×2, ×3 and ×4) factors are shown in Figure 7, Figure 8, and Figure 9, respectively. Figures (7)(8)(9) can clearly see that the clarity and smoothness of the reconstructed image of SFAN is better than other models, especially the reconstruction effect of the fringe image. Among them, the tiles in Figure 8 (in the box), although RCAN looks darker in color, the boundary characteristics of SFAN are more obvious. It can be seen that the SFAN algorithm has a good advantage in the SR of high-magnification images, and performs well in subjective evaluation.

IV. CONCLUSION
In order to improve the resolution of images in pipeline detection, aiming at the problems of feature extraction and attention fusion in image SR network, this paper proposes an improved residual channel attention network (SFAN) model based on DRSR, RCAN, DRAM, CSFM and other SR optimization networks. The setting of different residual attention modules is used to extract more image features. The feature information in the spatial attention module and the channel attention module is refined and strengthened by a new channel fusion method, and the up-sampling method of PixelShuffle is used to achieve the function of reconstructing the decoder. The overall structured residual setting adds the residuals of the image output by the decoder on the basis of the input image, accelerates the convergence of the network high-frequency residuals, thereby improving the reconstruction effect of image SR and reducing the amount of model parameters.
Comparing with some classic or latest methods on the public dataset (Set5 Set14, B100, Urban100, manga109), the results show that the parameters of SFAN is less than 2M, that is, the parameter amount of SFAN is low. In the comparison between SFAN and the related super-reconstruction network, the PSNR and SSIM values obtained are basically the largest. At the same time, SFAN performs better in subjective visual contrast, especially for image streak characteristics, that is, SFAN can provide better help for the extraction of details inside the pipeline.
YAONAN DAI received the master's degree from the Wuhan Institute of Technology, in 2019, where he is currently pursuing the Ph.D. degree. His research interests include robot motion control and mobile robot trajectory planning.
JIUYANG YU received the master's degree from Nanjing Tech University, in 1986. He is currently a second-level Professor with the School of Mechanical and Electrical Engineering, Wuhan Institute of Technology. His research interests include robot motion control and pressure vessel.
TIANHAO HU received the bachelor's degree from the Wuhan Institute of Technology, in 2019. He is conducting research in robotics engineering with the Wuhan Institute of Technology. His research interests include robot vision and robot dynamics analysis.
YANG LU received the master's degree from the Wuhan University of Science and Technology, in 2019. His current research interests include machine vision and image processing.
XIAOTAO ZHENG received the Ph.D. degree from the East China University of Science and Technology, in 2011. He is currently a Professor with the School of Mechanical and Electrical Engineering, Wuhan Institute of Technology. He has published more than 100 peer-reviewed articles. His current research interests include safety analysis, analysis and design, high-temperature structural integrity principle, and online monitoring technology of pressure equipment.