Lightweight Dual-Stream Residual Network for Single Image Super-Resolution

The deep convolutional neural network has achieved great success in the Single Image Super-resolution task. It is obviously that among the well-known super-resolution methods, the deep learning-based algorithms show the most advanced performance. However, the most advanced algorithms currently use complex networks with a large number of parameters, which makes it difficult to apply deep learning algorithms on mobile devices. To solve this problem, we propose a lightweight dual-residual network(LDRN) for single image super-resolution, which has better reconstruction quality than most current advanced lightweight algorithms. Due to its fewer parameters and computational expense, real-time and mobile applications of our networks can be easily realized. On the basis of the residual module, we propose a new residual unit, which uses two depthwise separable (DW) convolution to obtain better balance between feature extraction capacity and lightweight performance. We further design a dual-stream residual block, which contains a multiplication branch and an addition branch. The dual-stream residual block can improve the reconstruction performance more effectively than expanding the network width. In addition, we also designed a new up-sampling module to simplify the previous up-sampling methods. Extensive experimental results show that our network has better reconstruction performance and lightweight performance than most existing state-of-the-art algorithms. Our code is available at https://github.com/Jiangyichun-cust/pytorch-LDRN.


I. INTRODUCTION
Single image super-resolution (SISR) is a classic problem in the field of low-level computer vision. Addressing the problem has been proven to be useful in many practical cases, such as medical imaging [1], [2], infrared imaging [3]- [5] and remote sensing imaging [6]- [8]. From mathematical point of view, SISR needs to build a degradation model from highresolution(HR) images to low-resolution(LR) images and fit the inverse function [9]. The goal of SISR is to generate high-quality super-resolution(SR) images with clear details that are as close as possible to HR images. However, some high-frequency information is lost in the degradation process, so the LR image may correspond to multiple possible HR images, which makes the SISR problem ill-posed.
The associate editor coordinating the review of this manuscript and approving it for publication was Khin Wee Lai . In order to solve the problem of SISR, many methods have been proposed and proved to be feasible, including interpolation-based [10], reconstruction-based [11], and learning-based algorithms [12]. The interpolation-based algorithms is simple and effective, which are based on the assumption that pixels obey simple functions such as linear, quadratic or cubic functions, only consider the structure information of image in a small range. Therefore, the interpolation-based algorithm cannot reconstruct the high-frequency information of the image, which leads to poor SR image quality. The reconstruction-based algorithms usually require multiple frames of images to provide the necessary information and require artificially well-designed prior knowledge to constrain the reconstruction locally or non-locally. In these methods, when the scale factor becomes larger, the reconstruction performance will drop sharply.
Recently, the mainstream of SISR method is learningbased algorithm, in which the SISR algorithm based on sparse representation [12]- [14] shows a better effect than traditional algorithms. It treats LR images as low-dimensional manifolds of HR images, and constructs an overcomplete dictionary through regularization terms. The SISR methods based on Deep Convolutional Neural Networks (DCNN) have shown strong feature representation capabilities in recent years, and have been proven to be able to fit very complex mappings.
As a problem with a high-dimensional solution space, DCNN-based algorithms are very suitable for solving the problem of SISR. At present, DCNN-based super-resolution algorithm provides excellent reconstruction performance in various super-resolution tasks. Dong et al. [15] proposed an SRCNN consisting of only three convolutional layers. Although the number of layers is small, it achieves better reconstructed image quality than most traditional methods. However, only a few convolutional layers that are used in the SISR network using is insufficient for good reconstruction capability. The VDSR proposed by Kim et al. [16], adopted a very deep network structure, proved that the network depth and the number of parameters have obviously better effects on the reconstruction quality in the SISR. Ledig et al. [17] proposed SRResNet, which has made great progress compared with previous methods, is the first time that ResNet [18] and SISR are combined. Lim et al. [19] studied the influence of some commonly used structures on SISR, and further improved and enhanced the residual networks for SISR called EDSR. Although the DCNN-based algorithm has achieved great success, there are still some problems in most networks: 1) Most networks tend to be designed with more parameters and computational expense to improve the performance of these networks. This means that these methods are difficult to port to mobile devices, and it is now impossible to achieve real-time processing. As a low-level vision task, super-resolution is only a part of image preprocessing, so these inefficient methods are unacceptable in many fields. 2) Most network structures are designed for other tasks, especially image classification and segmentation. If these networks are directly applied to SISR, the expected results may not be achieved. The current residual network structure does not make enough targeted improvements for SISR images, and ignores the complexity of degrading from HR images to LR images. 3) Some widely used structures in SISR can be improved and simplified to reduce unnecessary parameters and computational expense. In order to solve the above problems, there has been a lot of works in recent years focusing on the realization of lightweight DCNN-based algorithm. A widespread strategy is to share parameters between different blocks, such as Multi-scale Residual Network for Image Super-Resolution (MSRN) and A Fast and Lightweight Network for Single-Image Super Resolution (MADNet) [20], [21]. However, the number of parameters still has a great impact on the performance of the network, which cannot be solved well by sharing parameters. For SISR, the network can better fit the inverse function of the image degradation process and is more conducive to improve the efficiency of the network. In this article, we propose a lightweight neural network called LDRN with good reconstruction ability to achieve a better balance between performance, parameter volume, and calculation volume. As shown in Figure 1, our network is superior to many state-of-the-art algorithm. In summary, the main contributions in this article are listed as follows: 1) We built an efficient architecture that incorporates multi-level features, adds global skip connections to increase sparsity, and removes all batch normalization (BN) for low-level computer vision like SISR. 2) We propose a new residual unit, which is composed of two depthwise separable convolution (DW convolution) that perform dimension expansion and dimension compression on features respectively. It has good feature extraction capability and fewer parameters. 3) We propose a new dual-stream residual block consisting of dual branches. The additive branch and the multiplicative branch can better fit the inverse process of the image degradation process. As a result, the better reconstruct performance is obtained under the same parameter amount and computational expense. 4) We propose a new up-sampling module, which removes the dimensional expansion before pixel shuffle, and changes to dimensional compression to directly output SR images through pixel shuffle. It significantly reduces the number of parameters with almost no performance loss.

II. RELATED WORK
Deep Convolutional Neural Network has been widely used in many fields [22]- [25] since the success of AlexNet in image classification tasks [26], and Single image superresolution is one of them. In this paper, we focus on the SISR method based on lightweight neural networks. In addition, due to space limitations, we only discuss two branches: DCNN-based SISR algorithm and model compression.

A. DCNN-BASED SINGLE IMAGE SUPER-RESOLUTION
Recently, deep learning has achieved significant improvements on SISR tasks. SRCNN, proposed by Dong et. al, only uses three convolutional layers to represent Patch extraction and representation, Non-linear mapping and Reconstruction operations. SRCNN interpolates the LR image to the size of the HR image at the beginning, which introduces a lot of redundant information and greatly increases the computational expense. Therefore, FSRCNN was proposed to solve these problems [27]. It processes the features of the same size as the LR image, and then uses Deconvolution to upsample the image at the tail. In subsequent research, the structure of processing under the LR image size and then up-sampling at the tail is widely used, which greatly reduces the computational expense and avoids the misconvergence arising from simply stacking convolutional layers.  Inspired by ResNet, in addition to stacking a large number of convolutional layers, VDSR also adds a global skip connection to solve the vanishing/exploding gradients problem. VDSR also shows that a deeper network can bring better reconstruction results. Kim et al. [28]. applied the recurrent neural network to the SISR task and proposed a DRCN with a deep recursive architecture, which can reuse layers to share parameters. SRResnet obtains good feature extraction ability by stacking a large number of residual blocks, and introduces PReLU to learn negative coefficients. EDSR deletes the batch normalization layer in the residual block. This is because experiments have found that the batch normalization layer destroys the absolute difference between pixels, which is unfavorable for low-level computer vision tasks.
On the other hand, LR corresponds to a complex SR solution since SISR is an ill-posed problem. Although these state-of-the-art algorithms have high PSNR/SSIM, traditional DCNN-based algorithm tends to generate the average of these SR solutions [29], so the generated image will be visually blurred to a certain extent. Some researches focus on generating clear and seemingly real images, such as SRGAN, ESRGAN, etc [30]- [32]. They design a discriminator to provide adversarial loss to reduce the gap between the generated data distribution and the real data distribution. Although GAN can generate data that looks realistic enough, the generated data is not true, which is unacceptable in medical, military and other fields. And this will cause the GAN network to perform poorly in PSNR/SSIM.
In this article, we focus on the pixel-level accuracy of the reconstructed image. On the other hand, our goal is to design a network with good reconstruction performance, and achieve real-time and lightweight super-resolution.

B. MODEL COMPRESSION
The lightweight design of neural networks has always been a research hotspot, and we will provide a brief survey of some current model compression methods.
Network pruning reduces the size of the network by deleting some unnecessary branches [33], [34], but it still requires training a large network and designing methods to evaluate the importance of branches. Knowledge distillation requires a large teacher network to guide smaller distilled network training [35], [36]. It also needs to calculate the gap between the two network feature maps, which requires more complex techniques. At present, the most common strategy is to directly design a lightweight deep convolutional neural network. For example, MobileNetV1 proposed DW convolution [37], which uses a convolution kernel to process a feature map, and then uses 1 × 1 convolution to linearly fuse the output. Such an operation significantly reduces the amount of parameters and computational expense, with little performance degradation. In order to reduce the amount of model parameters and improve performance, MobileNetV2 introduces a residual structure and bottleneck [38]. A scale factor is set to expand the channel in the compression model. As a result, good performance is obtained while reducing the number of parameters. ShuffleNet replaces the 1 × 1 convolution in MobileNet with pointwise group convolutions, and performs channel shuffle after the first group convolution(GConv) in the residual unit to reduce boundary effects [39].
Recently, many new algorithms have appeared for SR-related model compression. Inspired by MobileNet V1, CARN designed a compression strategy and a cascading framework with recursive layers [40]. In addition, CARN-M, as a lightweight version of CARN, achieves higher precision with fewer parameters and less computational expense. The IDN, proposed by Hui et al. [41], combines an enhancement unit and a compression unit in the information distillation module to mix multiple types of features and extract useful information. Since each layer has fewer filters and group convolution is used, it has higher speed and fewer parameters. Li et al. [42] referred to the structure of U-Net, adopted the residual block design of MobileV2, and proposed a Super Lightweight Super-Resolution Network (s-LWSR). s-LWSR also designed an information pool module to avoid overfitting and fully utilize the information.
Although these lightweight networks have achieved impressive results, there is still a lot of room for improvement. We can still build a more efficient lightweight network to better balance the relationship between parameters, computational expense and performance.

III. PROPOSED METHOD
As we mentioned in Section I, we propose a new lightweight dual-stream residual network for SISR, called LDRN. This section will introduce our proposed LDRN in detail. First, we provide a comprehensive and detailed description of the main network framework of LDRN and its sub-modules, including the dual-stream residual block and up-sampling module we proposed, which are the core of our method. Then, we explained the loss function used in network training.

A. NETWORK FRAMEWORK
Before explaining the technical details of these sub-modules, it is necessary to point out that the goal of LDRN is to establish a mapping from low-resolution images I LR to high-resolution images I HR , so we formulated the whole process as: where I SR represents super-resolution images which are the outputs of the network, F(·) represents the mapping relationship from I LR to I HR fitted by networks. As shown in Figure2, our LDRN mainly consists of four sub-modules: shallow feature extraction module(SFEM), deep feature extraction module(DFEM), feature fuse module(FFM), and up-sampling module(UM). First, the low-resolution image is processed by SFEM into shallow features as the input for the subsequent DFEM. On the other hand, the outputs of SFEM are saved and then fused with the deep feature in the subsequent process after the DFEM to provide texture information of the image. Such a shallow feature extraction module does not need a complex structure, so we use two convolution layers to extract the shallow feature. If the output channels of DFEM were different from other sub-modules, a 1 × 1 convolution block for dimension reduction is added at the tail of SFEM. The operation can be defined as: (2) (3) represents the LeakyReLU activation function, the w represents the weight of the convolution kernel, C m is the number of channels of the module m. According to our plenty experiments, the Batch Normalization and Bias reduce the performance of networks to varying degrees, so we remove all of them from our networks. Then, the shallow features are sent into the DFEM which consists of a series of well-designed blocks to extract the deep feature. These blocks are specially designed for superresolution tasks. They are not only lightweight and fast, but also have good performance. Like the previous works, these blocks are connected in series to increase the capacity and depth of the network to obtain deep image structure information, which can be used to reconstruct high-quality SR images. However, if we only use deep features to reconstruct the SR image, some details in the LR image will be lost. These details will be retained in the shallower features. Therefore, in order to use different levels of features and improve the performance of our network, the dense fusion structure is designed in DFEM. The output of all blocks in DFEM is concatenated and passes through a 1 × 1 convolutional layer for information fusion and selection. The operation can be defined as: where F i (·) represents the proposed i-th dual-stream residual block, θ i is the parameters of the i-th dual-stream residual block, F cat (·) represents the operation of concatenate. Finally, we further use global feature fusion, which means adding the output of SFEM f 3 and the output of DFEM f n+1 . The global skip connection structure helps to ensure that the networks focus on reconstructing the high-frequency information lost in the image degradation process, while ignoring the low-frequency information that is easy to reconstruct and occupies most of the image. After the Global feature fusion, these features are processed by two convolution layers and pixel-shuffle to generate the Super-Resolution images. It is a well-designed up-sampling module, which is different from the traditional up-sampling method. It is faster and lighter, and suitable for lightweight neural networks. So, Equation 1 can be written as: (7) where F UM represents the up-sampling module, F FFM represents the feature fuse module, F DFEM represents the deep feature extraction module, and F SFEM represents the shallow feature extraction module.

B. DUAL-STREAM RESIDUAL BLOCK
In our proposed LDRN, the deep features are extracted by a series of fast and lightweight basic residual blocks, which is one of the most important designs. Here, we will provide the detail description of the structure of our proposed dual-stream residual block.
• Residual unit: Our network no longer uses traditional convolution with a large amount of calculations and parameters. The traditional 3 × 3 convolution can be replaced by DW Convolution and a closely followed 1 × 1 convolution without significant performance degradation. Mobilenet V2 has shown through experiments that the width of the DW convolution has a significant influence on the performance of the residual block. Based on extensive experiments, we found that the number of channels is also important for the performance of the network in low-level computer vision tasks. Specifically, expanding dimension in the residual block always performs better than the compressing dimension.
Our proposed residual unit used in our dual-stream residual block is shown in Figure3. Different from previous works, our LDRN uses two DW convolutions to extract the feature, one of them is used to initially extract features and expand dimension, while the other is used to refine features and compress dimension. 1 × 1 convolution performs information exchange between different channels, and its parameter amount is much larger than DW convolution. It is inefficient to use it only as an compress channel. So we insert DW convolution after compressing the dimension to improve the efficiency of the residual unit. Then, we replace all the activation functions in the residual unit with LeakyReLU to learn negative factors. Experiments show that the proposed structure has good feature extraction capability and fewer parameters. For the input x r , our proposed residual unit F e can be defined as: where dw 3×3 represents the weight of DW Convolution, w expand 1×1 represents the 1 × 1 convolution for expanding dimension, w compress 1×1 represents the 1 × 1 convolution for compressing dimension.
• Dual information stream: For the SISR task, It has been proved that the residual structure is a good choice because it can ensure that the backward gradient is propagated more effectively. On the other hand, the residual structure allows the network to focus on estimating the high-frequency information of the images, which is lost during image degradation. However, a large number of studies have found that simply stacking residual blocks to build a very deep network would still cause training difficulties. As the network deepens, the performance increase is also limited. With the massive increase in the number of network parameters and calculations, the performance growth is quite limited. Therefore, it is necessary to design an efficient residual structure. Base on the proposed residual unit, we propose a dual information stream structure with multiplicative information stream and additive information stream. The traditional residual block and its improved version only learn the residual information (additive difference) of the main information stream. But the degradation process from HR image to LR image is a highly complex process, and we think it is not enough to consider only additive differences. Therefore, the structure we proposed not only learns the residual information but also learns the multiplicative factors. The proposed dualstream residual block can be defined as: The proposed dual-stream residual block is more efficient than increasing the expansion rate or width under the same amount of parameters, which will be further proved in subsequent experiments.

C. UP-SAMPLING MODULE
Up-sampling LR images to HR images is a crucial step of SISR, which is directly related to the overall structure of the network. The previous research interpolated LR images to HR images at the beginning of the network or upsampled multiple times in steps. These methods usually use bicubic to expand the resolution of the image and then learn highfrequency information through subsequent modules. However, too much redundant information is introduced into the images, so the number of network parameters and calculations significantly increased. In response to it, recent work tends to learn image features at the LR image resolution and then sample the image at the end of the network. The up-sampling module mainly includes transposed convolution, unpooling and pixelshuffle. It is proven that pixelshuffle has not only satisfactory performance but also high calculation speed. There several types of common up-sampling module based on pixelshuffle, as shown in Figure5. The current method firstly expands the dimension of the feature map to ensure that the number of channels is the same as before and then shuffles those feature maps into highresolution feature maps. However, the convolutional layer used to expand the dimensionality generates a large number of parameters, which is not beneficial enough to the performance of a lightweight network. Therefore, a new upscale module is proposed in the paper, as shown in Figure5, which is efficient and lighter. Any dimension expanding operation is remove in the up-sampling module, which would reduce the parameters of the up-sampling module. For a SR image with C SR channels, first, we input the feature map of DFEM after 1 × 1 convolution and fusion into a decoder, then the decoder would output feature maps with 16 × C SR channels. Finally, these feature maps are shuffled to the final SR images by pixelshuffle. The proposed up-sampling module can be defined as: (12) where F PS represents the operation of PixelShuffle.

D. LOSS FUNCTIONS
The Loss function of the neural network is used to compare the output of the network with the ground truth. In order to obtain an SR image that is as close as possible to the HR image and has a satisfactory visual effect, we mix the perceptual loss and the pixel loss to become the loss function of LDRN. First, we used the L1 loss function commonly used in SISR as the pixel loss. The L1 loss function is different from the L2 loss function. It does not over penalize large errors and has different convergence properties. The L1 loss function can be defined as: where the p represents the pixel of images, the N represents the total number of pixels of the HR or SR image. Previous studies have shown that the L1 function is less likely to VOLUME 9, 2021 fall into local optimal values. However, the L1 function and L2 function suffer from splotchy artifacts in flat areas. Therefore, we add the MS-SSIM loss function as the perceptual loss, which can be defined as: Our total loss function can be defined as: where a represents the weight used to balance the pixel loss and the perceptual loss. According to the experiment, we set a to 0.9.

IV. EXPERIMENTS
In section III, we have explained the structure of our proposed network in detail, and we will further prove the performance of the proposed network in this section. We compared the proposed network with the state-of-the-art networks and performed ablation experiments to prove the effectiveness of the proposed component.

A. IMPLEMENTATION DETAILS
• Dataset: We train our network on two widely used super-resolution datasets: DIV2K [43] and Flicker2K. DIV2K includes 800 LR-HR image pairs, and Flicker2K includes 2650 image pairs. The HR patch size is cropped to 256 × 256, while the minibatch size is 32. We only flip the image randomly (vertical or horizontal) without any other data enhancement. In order to compare fairly with other methods, we use four widely used data sets: Set5 [44], Set14 [45], BSD100 [46] and Urban100 [47]. Set5 and Set14 are datasets composed of 5 and 14 pictures, and the resolution of these pictures is low. Both BSD100 and Urban100 are datasets composed of 100 high-resolution images, which are more challenging to be super-resolution reconstructed than Set5 and Set14. All images are used to evaluate the performance of the algorithm, and the performance of the algorithm in each dataset is calculated separately.
• Training Setting: Since large-scale super-resolution tasks are more challenging, we only focused on 4x up-sampling tasks. So, we used 16 dual-stream residual blocks for the DFEM of LDRN with expanding rate 2. The widths of SFEM, DFEM, FFFM and UM are all set to 64. For implementation, we trained our network with learning rate 0.0001 for all layers for a total of 400 epochs. For optimization, we used Adam with betas to (0.9,0.999) and weight decay 1e-8. In addition, in order to train LDRN better, we have made some finetuning and adopted some training techniques such as annealing LR, multi-step LR, etc. We mark the training result as LDRN + . All experiments were conducted using Pytorch on a NVIDIA RTX3090 GPU.
• Testing Setting: We mainly focus on the lightweight performance and reconstruction performance of the model. Lightweight performance includes two indicators: parameters and multi-adds, which represent the requirements for memory and computational expense when the network is actually working. In addition, we assume that the output SR image is 1280 × 720 to calculate the multi-adds. The reconstruction performance mainly includes the widely used PSNR and SSIM. We convert the SR image to the YCbCr space according to the convention and then calculate the PSNR and SSIM on the Y channel.

B. COMPARISON WITH STATE-OF-THE-ART MODELS
We compare the performance of our LDRN to 11 state-ofthe-art SISR methods: Bicubic interpolation [10], A+ [48], SRCNN. [15], FSRCNN [27], LapSRN [49], CARN-M [40], IDN [41], VDSR [16], MemNet [50], MADNet-LF [21] and s-LWSR32 [42]. we use the source code and weight parameters provided by the authors, or train according to the method provided by the authors to get the result of all methods. Because we concerned more with the challenging 4x super resolution, we adopted the 4x version of the comparison algorithm. For networks of different sizes like CARN and s-LWSR, we have selected a lightweight version of them for a fair comparison. TABLE 1 shows the SR results on four testing datasets. It is obvious that our network performed satisfactorily in the four test datasets. The performance of our network on the Set5 and Set14 datasets is not the best. Specifically, the PSNR of our network is lower than several compared algorithms, but SSIM of our network is the best. In view of the small size of both Set5 and Set 14, the results are not accurate enough for measuring network performance persuasively. The B100 and Urban100 datasets have more images and are more difficult to reconstruct. The performance of our network on the B100 and Urban100 datasets is better than all the compared State-ofthe-Art algorithms. Our network has significant progress in both PSNR and SSIM compared to previous studies, which demonstrates the superior super-resolution performance of our network. In addition, it is necessary to point out that our network is not the smallest in terms of parameters and Multi-Adds, but it is acceptable on existing mobile devices. Our network can still easily achieve real-time super-resolution without taking up a lot of memory.
In order to better prove the advantages of our proposed method, we train a larger s-LWSR40 on the same datasets and options as our proposed method, which has almost the same parameters and more multi-adds. Under the same conditions, our proposed network has better performance in 4 valid datasets than s-LWSR40.
The results of qualitative analysis strongly prove the advancement and practicality of our network. To further demonstrate the satisfactory super-resolution capability of our algorithm, we conducted a qualitative analysis on the super-resolution reconstruction capability of the network. In addition to Table 1, we show some SR images from the general datasets. For the sake of fairness, the comparison images we use are from the best super-resolution results published by the original authors. We will not make subjective comparisons for algorithms for which the original author did not give the original image data. These results are raw data without any type of compression. Figures show that our LDRN has a strong reconstruction ability in SISR, whether in terms of the visual authenticity of the image or the accuracy of reconstruction.
Urban100 is one of the most difficult general datasets to reconstruct, so we selected some SR images to show the advanced performance of our proposed algorithm, as shown in Figure 5 and Figure 6. In Figure 5, the guardrail in the SR image of our algorithm does not produce visible blur, which is obviously closer to the HR image than other algorithms. In Figure 6, the building windows reconstructed by our algorithm have a more natural shape than other algorithms. It shows that our network can effectively extract internal information within a specific image.
One of the difficulties of single image super-resolution reconstruction is the Moire pattern in the image degradation process. Whether the details of the image can be reconstructed accurately depends on how much internal information the network extracts from the image and how much external information it learns from the data set. However, due to the lack of depth or parameters of lightweight neural networks, they are often affected by the Moire pattern and cause reconstruction errors. As shown in Fig7, although our network is also a lightweight network like other compared algorithms, because of its good feature extraction ability, it can reconstruct image details more accurately and is less affected by the Moire pattern.
Whether it is qualitative analysis or quantitative analysis, the results show that our method is better than other advanced methods, and the reconstructed SR image is closer to the HR image.

C. ABLATION STUDY
In order to prove the validity of the proposed structure, we conducted ablation experiments. We give the results and analysis of ablation experiments in this section.
First of all, we pay attention to the effectiveness of the Basic Residual Block. We use E = 1, E = 2, E = 3 and E = 4 to represent the expansion rate of the feature extraction module in dual-stream residual blocks. At the same time, we replace our proposed residual unit with the residual unit of MobileNetV2 with the expansion rate 2 and mark it as LDRN(M). As shown in TABLE 2, although PSNR and SSIM increase with the increase of the expansion rate, this paper pays more attention to the lightweight and rapid network. The performance gain brought by the excessive expansion rate is not satisfactory. The expansion rate E = 2 not only has fewer parameters and MAC, but also can significantly improve the performance, so we adopt the expansion rate E = 2. The performance of our residual unit is significantly better than the result of MobileNetV2's residual unit, which proves that the proposed residual unit can improve the performance of the network.
Then, we replace the multiplicative branch in the dual information flow with an additive branch. As shown in TABLE 3, Although the modified network has the same parameters and MAC, the performance of network on the public datasets is dropped. The visual effect comparison is shown in Figure 8. After replacing the multiplication branch with the addition branch, the reconstruction quality of the network is significantly reduced visually, and there are more reconstruction errors, which is consistent with the quantitative analysis results. It shows that the multiplicative access in the dual information stream proposed by us can help the network tomap LR images to HR images more correctly and help improve the quality of SR images. VOLUME 9, 2021   Next, we focus on the design of the up-sampling module. We spliced the proposed up-sampling module and various traditional up-sampling modules to the tail of our network and trained them. According to the order of the four modules mentioned in section 3, we denote the four different up-sampling module as type-a, type-b, type-c and our proposed type-d.  Result of an ablation study on the effect of the dual information stream. The evaluation is on the Set5, Set14, BSD100, and urban public datasets.

TABLE 4.
Result of an ablation study on the effect of the up-sampling module. The evaluation is on the Set5, Set14, BSD100, and urban public datasets. Table 4, the up-sampling module proposed by us can reduce a large number of parameters and computational expense with only weak performance degradation, which is suitable for a lightweight network.

As shown in
Finally, we also use our proposed Dual-steam Residual Block and up-sampling Module to replace some of the residual blocks in CARN-M and its original up-sampling Module. As shown in Table 5, the improved CARN-M is marked TABLE 5. Result of replacing the module in CARN-M with our proposed module. The evaluation is on the Set5, Set14, BSD100, and urban public datasets. with '*' to distinguished between original CARN-M and improved CARN-M. Although the improved CARN-M significantly reduced the PSNR of Set14, it performed better than the original CARN-M on B100 and Urban100. Moreover, the improved CARN-M has fewer parameters than the original CARN-M. Therefore, this proves that our proposed method can be extended to some existing methods to improve their performance.

V. CONCLUSION
In this article, we propose a Lightweight Dual Residual Network for Single Image Super-Resolution. Specifically, a new residual unit is designed to efficiently extract image features, which is a simple and effective structure. Based on the residual unit, we put forward the dual-stream residual block. It can better adapt to complex super-resolution tasks to obtain accurate SR images. Additionally, we further optimize the up-sampling module to get a lighter and faster structure which can still obtain high-quality reconstruction results. Extensive experiments show that our model is better than most of the most advanced lightweight SR algorithms, with excellent reconstruction and lightweight performance. In our future work, we will focus on further improving the performance of the network, developing a multi-scale version, and applying the network to mobile devices. WEIDA ZHAN is currently a Professor and a Supervisor of Ph.D. candidates at Changchun University of Science and Technology. His research interests include digital image processing, infrared imaging technology, and automatic target recognition technology.
DEPENG ZHU was born in Wuwei, Gansu, China, in 1996. He received the B.S. degree in electronic and information engineering from Changchun University of Science and Technology, where he is currently pursuing the Ph.D. degree. His research interests include image fusion, image registration, and object detections. VOLUME 9, 2021