Agricultural Pest Super-Resolution and Identification With Attention Enhanced Residual and Dense Fusion Generative and Adversarial Network

The growth of the most significant field crops such as rice, wheat, maize, and soybean are influenced because of various pests. And crop production is decreased due to various categories of insects. Deep learning technologies significantly increased the efficiency of identifying and controlling agricultural pests attack. However, agricultural pests images obtained are often obscure and unclear because of the sparse density of cameras deployed in the real farmland. This always makes pests difficult to recognize and monitor. Additionally, the existing classification and segmentation methods are not satisfying for the identification of low-resolution images because they are pre-trained on the clear and high-resolution datasets. Therefore, it is crucial to restore and upscale the obtained low-resolution pest images in order to improve classification accuracy and the recall rate of the instance segmentation. In this paper, we propose a generative adversarial network (GAN) with quadra-attention and residual and dense fusion mechanisms to transform low-resolution pest images. Compared with previous state-of-the-art PSNR-oriented super-resolution methods, our proposed method is more powerful in image reconstruction and achieves the state of the art performance. The experiment results show that after reconstructing with our proposed gan, the recall rate increased by 182.89% and classification accuracy also improved a lot. Besides, our proposed method could decrease the density of the camera layout in the agricultural Internet of Things (IOT) monitor systems and the cost of infrastructure, which is practical for real-world applications.


I. INTRODUCTION
The production of crops is associate with many factors, for example, climate change, plant diseases, and insect pests. According to recent researches, about half of the crop yield in the world is lost to pest infestations and crop diseases [1]. Crop pests cause significant damage to crops and mainly affect the productivity of crop yield, whether in developing or developed countries. Hence, it is of great significance to identify insects in the crops at an early stage and select optimal treatments, which is an important prerequisite for reducing The associate editor coordinating the review of this manuscript and approving it for publication was Chang-Hwan Son . crop loss and pesticide use. There are too many types of insects and the number of individuals which belongs to the same species is enormous. However, traditional pest identification of insects is typically time-consuming and inefficient. Therefore, in order to improve the efficiency of agricultural production, a new effective recognition method should be proposed.
Nowadays, with the development of deep learning, many researchers apply this technology into different fields and many excellent approaches have been proposed. Because of the successful applications of deep learning in diverse areas, it also has been used in agriculture. Thenmozhi and Reddy [2] proposed a CNN model and used it to identify VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ three pest datasets and the highest classification accuracy of 96.75%, 97.47%, and 95.97% was achieved in their proposed CNN model for three datasets respectively. A fine-tuned GoogLeNet model was proposed by Li et al. [3] to recognize their collected pest dataset and obtained an improvement of 6.22% compared to the state-of-the-art method. Tetila et al. [4] presented an analysis of the network weights for the automatic recognition of soybean leaf diseases applied to images taken straight from a small and cheap unmanned aerial vehicle (UAV). They evaluated four deep neural network models trained with different parameters for finetuning (FT) and transfer learning. They tested the data set created from real flight inspections in an end-to-end computer vision approach and results suggested that their method substantially improved the identification accuracy. A new method of recognizing apple leaf diseases through the regionof-interest-aware deep convolutional neural network was proposed by Yu and Son [5] Two subnetworks were first designed in their method. One is for the division of the input image into three areas: background, leaf area, and spot area indicating the leaf diseases. The experimental results proved that correct recognition accuracy can be increased using the predicted region-of-interest(ROI) feature map. It is also shown that the proposed method obtains better performance than the conventional state-of-the-art methods: transfer-learning-based methods, bilinear model, and multiscale-based deep feature extraction and pooling approach. Cheng et al. [6] used a fine-tuning method to classify and identify a 10-classes pest dataset with deep convolutional neural networks (DCNNs), which achieved satisfactory recognition results. Yue et al. [7] proposed a super-resolution method for agricultural pest image restoration and detection and also gained a high detection result. These previous works demonstrate the feasibility and effectiveness of applying deep learning in the field of pest identification. However, pest images obtained from real farms are typically unclear and very small in pixel because the arrangement of the cameras in the farmland is relatively sparse. Although the existing classification and segmentation methods are very mature, the classification and segmentation for low-resolution and small pixel scale pest images still could not reach a satisfactory result. Poor image quality significantly reduces the classification and segmentation results of pre-trained classifiers and segmentation systems, which are typically trained on clear high-resolution datasets. In order to enhance the pest classification and segmentation systems, low-resolution pest images need to be upscaled to increase spatial resolution and reconstruct the high-frequency details of images. Therefore, in this paper, we propose a generative adversarial network (GAN) with quadra-attention, residual and dense fusion mechanisms to transform lowresolution pest images. The proposed network is named PSRGAN. And for the sake of evaluating our PSRGAN, we compare it to state-of-the-art super-resolution methods in terms of classification accuracy and instance segmentation recall rate when images are upscaled. We present experiments using eight classic classification networks with the pest images of Xie1 [8] dataset 1 and Xie2 [9] dataset. 2 Besides, we use the Mask RCNN [10] as the object instance segmentation model and choose Diostrombus, Chauliops, and Callitettix as our research objects.
Experimental results testify that classification accuracy and segmentation recall rate could be improved if images are transformed using super-resolution methods. And in comparison with the state-of-the-art super-resolution methods considered in this research, PSRGAN provides superior performance in improving pest image classification accuracy and segmentation recall rate. Our main contributions can be summarized as follows.
1) We propose a novel image super-resolution method for upscaling agricultural pest images. 2) To the best of our knowledge, our proposed method is the first to introduce GANs into agricultural pest image processing. 3) According to benchmark tests, PSRGAN outperforms state-of-the-art super-resolution methods in terms of visual quality, improving classification accuracy and segmentation recall rate.

II. RELATED WORKS
Previously, plenty of automatic pest recognition systems based on machine learning (ML) have been proposed.
Wen et al. [11] proposed an effective feature-based insect automatical classification for orchard insects using six machine learning algorithms. And five common pest species in orchards were used to verify the classification method. The maximum classification accuracy of 89.5% was observed.
Faithpraise et al. [12] demonstrated the combination of the k-means clustering algorithm and the correspondence filter to achieve pest detection and recognition. The detection used the relative filter to identify different types of pests, which is time-consuming and ineffective when the dataset is huge. An automatic identification system designed by Wang et al. [13] to identify insect specimen images at the order level. The system adopted artificial neural networks (ANNs) and support vector machine (SVM) as the pattern recognition methods for the identification tests and the system performed with good stability and accuracy reached 93%. Xie et al. [8] developed an insect recognition system using advanced multiple-task sparse representation and multiple-kernel learning (MKL) techniques, which could combine multiple features of insect species to enhance the recognition performance. And their experimental results on 24 common pest species they collected outperformed other state-of-the-art methods of the generic insect categorization. Traditional machine learning algorithms [14]- [18] were proved to perform well when the number of pest species was small, but they cannot match the accuracy provided by deep learning methods when multiple features need to be extracted manually. In recent years, to improve crop management and health, many deep learning methods have been applied to identify pests and achieved state-of-the-art results. Shijie et al. [19] proposed a detection algorithm on leaf images and constructed the convolution neural network model to detect tomato pests and diseases based on VGG16 [20] and transfer learning. The detection model achieved an average classification accuracy of 89%. In [21], an 8-layer CNN network was developed for the visual localization and classification of agricultural pest insects by computing a saliency map and applying deep convolutional neural network (DCNN) learning. And achieved a mean Accuracy Precision (mAP) of 0.951 for the classification of 12 important paddy insect species. Dawei et al. [22] put forward a diagnostic system based on transfer learning for pest detection and recognition, which achieved an accuracy of 93.84% and the performance of their proposed method is comparable to human experts and the traditional neural network. They also used their model to recognize two types of weeds and achieved an accuracy of 98.92%. In [23], the deep convolutional neural network (DCNN) was used in their study to classify crop pests image. On the ground of their experiments, in which LeNet-5 and AlexNet were used to classify pests image. Furthermore, 82 common pest types have been classified, with the accuracy reaching 91%, which proves that their proposed method is not only feasible but preeminent.
According to these previous works, deep learning is a useful tool in the field of pest identification. Nevertheless, the pest images captured from farmland are typically low-resolution and very small in pixels, which have a bad impact on the improvement of pest image identification accuracy. In order to enhance pest segmentation and recognition systems, low-resolution pest images need to be upscaled. Thereby, a generative adversarial network (GAN) with quadra-attention and residual and dense fusion mechanisms is proposed to upscale low-resolution pest images.

A. NETWORK ARCHITECTURE 1) OVERALL ARCHITECTURE
Our proposed PSRGAN consists of a generator and discriminator. The generator network of our PSRGAN is shown in Figure 1, low-resolution images were put into the generator network and divided into two branches. One was put into the CARAFE [24] upscale module after the first convolution layer in the generator network and then, this branch went through the self-attention module. The other was fed into the reconstruction net to predict the details after went through the second convolution layer and PReLU activation layer. And the reconstruction net used the global residual learning and combined the upscaled images and edges with the predicted details before a convolution layer to generate the highresolution images.
To discriminate real high-resolution(HR) images from generated super-resolution(SR) images, we design a discriminator network which is illustrated in Figure 2. We use LeakyReLU activation and avoid max-pooling throughout the network. The discriminator network is trained to solve the maximization problem. It is made up of seven convolutional layers with an increasing number of filter kernels, increasing by a factor of 2 from 64 to 512 kernels. Stridden convolutions are used to reduce the image resolution each time the number of features are doubled. The resulting 512 feature maps are followed by a final LeakyReLu activation function and two linear layers to obtain the probability for sample classification.

2) RESIDUAL AND DENSE FUSION
With the deepening of the network, the phenomenon that the accuracy of the training dataset decreases and the error of rate rises is the problem of degeneration. It stands to reason that a deeper model should not have a higher error rate than its shallower model. This is not due to overfitting, but because when the model is complex, SGD optimization becomes more  difficult, resulting in a model that is not good. ResNet [25] was proposed to solve the problem of degradation in deep learning. The residual block in ResNet [25] which is shown in Figure 3 (a) was implemented by the residual connection. The input and output of the block were added by elementwise by the residual connection. This simple addition does not add extra parameters and calculations, but it could greatly increase the training speed of the model, and improve the training effect. When the number of layers of the model is increased, the residual block can solve the degradation problem well.
Compared to ResNet [25], DenseNet [26] proposed a more aggressive dense connection mechanism that connects all layers, and each layer will use all the layers in front of it as its additional input. In DenseNet [26], each layer will concat with all the previous layers in the channel dimension and serve as the input for the next layer. Due to the dense connection method, DenseNet [26] improves the gradient backpropagation, making the network easier to train.
The dense block in DenseNet [26] which is shown in Figure 3 (b) was implemented by the dense connection. Since each layer can directly reach the final error signal, implicit deep supervision is realized. In addition, DenseNet [26] realizes feature reuse and uses a small growth rate. The unique feature map of each layer is relatively small, which makes the parameters smaller and the calculation more efficient.
To make good use of the residual and dense connections, we combine them in a block named residual and dense fusion block, which is shown in Figure 3 (c). Compared with residual networks, the generator of our PSRGAN could preserve more information from the previous layers. Compared with dense networks, our proposed generator could decrease the channel growth rate by half which significantly decreased the network parameters and made a deeper structure trainable. And the fusion operation could be calculated as Formula 1.
means the concatenate operation, F 1 i−1 and F 2 i−1 denote the sliced parts of the features from the previous layer.
and F 1 i−1 make the dense fusion. These operations could make our network become a partial residual network [25] and dense network [26]. At the end of each block, we use a convolution layer to reshape the feature map to the original size, which could be calculated as Formula 2, where W t means the weight of a 1 × 1 convolution for block-feature fusion, F j−1 denotes the feature of the proceeding residual and dense fusion block and F j is the output feature of the current residual and dense fusion block. What's more, with the help of the residual and dense fusion, our network could synchronously do both residual and dense connections at the same time, which decreases half of the network growth parameters and improves network performance.

3) CARAFE UPSAMPLING
The upsampling operation can be expressed as a dot product of the upsampling kernel at each location and the pixels in the corresponding neighborhood in the input feature map, which is called feature recombination. The nearest neighbor or bilinear upsampling determines the upsampling kernel only by the spatial position of the pixels and does not use the semantic information of the feature map. It can be regarded as a uniform upsampling, and the perception domain is usually small. The upsampling kernel of the deconvolution operator is not calculated from the distance between pixels but learned through the network. The same upsampling kernel is applied to each position of the feature map, and the information of the feature map content cannot be captured. A large number of parameters and calculations are introduced, especially when the size of the upsampling kernel is large. Dynamic filter predicts a different set of upsampling kernels for each position of the feature map, but the amount of parameters and calculations is more explosive, and it is recognized that it is difficult to learn. Compared with the previous upsampling operator, the CARAFE [24] operator which is shown in Figure 4 can have a larger receptive field during recombination and will guide the recombination process according to the input characteristics. At the same time, the entire operator is relatively lightweight. Specifically, the CARAFE [24] operator first uses the input feature map to predict the upsampling kernel, and the upsampling kernel at each position is different, and then performs feature reassembly based on the predicted upsampling kernel. In different tasks, CARAFE [24] has achieved significant improvements, while bringing only a small amount of additional parameters and calculations.
CARAFE [24] consists of two steps, the first step is to predict a reassembly kernel for each target location according to its content, and the second step is to reassemble the features with predicted kernels. The first step could be shown in Formula 3, the kernel prediction module ψ predicts a location-wise kernel W l for each location l which is based on the neighbor of χ l . The second step is shown in Formula 4, φ denotes the content-aware reassembly module, which reassembles the neighbor of χ l and with the kernel W l . We use the CARAFE [24] upsampling operator in VOLUME 8, 2020 the generator of our PSRGAN without introducing too many parameters and calculations and get good results in segmentation tasks. We use a spatial attention module [27] in the generator network of our proposed PSRGAN, which could help enhance the details in the reconstructed process and improve the effect of residual and dense fusion. As is shown in Figure 5, the spatial attention module uses the global residual learning and the features of the global residual learning are increased to twice the original. Half of the channels after getting through the convolutional layer are weighted by the global information and the other half retains the original information. Then, both of them make global and local fusion, which improves the quality of the high-resolution image generated by the reconstruction net. The following formulas denotes the process of the spatial attention module.
where b denotes the basic blocks, ω b means the weight for the images and S denotes a slice operation. T 1 and T 2 mean the features after the slice operation, P denotes the channel squeeze convolution, Q means the channel multiplier convolution. U means the subpixel shuffler [28] for upscaling which increases the width and height of I LR to the desired size.

2) CHANNEL ATTENTION
The channel attention module [27] we used in the generator could be learned by itself to enhance the important channels and suppress the useless channels, which could help decrease the parameters of our proposed PSRGAN and make the network easier to converge. As is shown in Figure 6, the feature maps will be squeezed into a global pooling layer after passing through the first convolution layer. After that, two 1 × 1 convolution layers generate a bottleneck. In the end, a Sigmoid layer is used to normalize the information and generates an output, which is used to reweight the original output to generate a self-learned channel-wise attention.
The whole process could be calculated as the following formulas: where S(.) is the squeeze operation, which squeezes the features in every channel to a global mean, H and W mean the height and width of the input.
where C(.) means the channel attention operation, denotes the ReLU function, W b and W a mean two 1 × 1 convolution. W a first decreases the channels to 1/16 of the original feature map, then W b expands the feature map to the original shape which generates a bottleneck, denotes the sigmoid function which normalizes the weights for each channel. We use these weights to improve the useful channels and suppress the useless channels.
What's more, as is shown in Figure 3(d), we use the spatial attention module and the channel attention module in the residual and dense fusion block to build the basic block of the generator of our PSRGAN.

3) TEXTURE ATTENTION
The high-frequency details of an image usually located on the edges, thus it is important to give and attention with the guidance of edges and we use texture attention in the reconstruction network. Figure 7 shows the texture attention module we used in the generator network, which could be calculated as the Formula 9-10.
As is shown in Formula 9, E denotes expanding the original number of channels and the initial input features are increased to twice the original and the expanded features are divided into two parts. As is shown in Formula 10, Up means upsampling operation and Canny denotes an operator of extracting

4) SELF-ATTENTION
The traditional super-resolution model is easy to learn texture features, but it is not easy to learn specific structural and geometric features. In a convolutional neural network, the size of each convolution kernel is very limited, so each convolution operation can only cover a small area around the pixel. It is not easy to capture the features which are far away, because the multi-layer convolution and pooling operations will make the height and width of the feature map smaller and smaller. Self-attention [29] obtains the global geometric features of the image in one step by directly calculating the relationship between any two pixels in the image, so the self-attention mechanism could learn the dependencies among the global features well. Figure 8 shows the self-attention module for the generator of our proposed PSRGAN. The feature maps are increased to twice the original before passing through the three 1 × 1 convolutional layers. Then, half of the feature maps are fed into the ConvQ layer and the other half are put into the ConvK layer. And the extra layer of convolution maps added by the initial feature maps are put into the ConvL layer, which helps the self-attention module learn more parameters. After feature maps in the ConvQ layer are transposed, they make the matrix multiplication and softmax operation with feature maps in the ConvK layer to get the attention maps. Finally, feature maps in the ConvL layer make the matrix multiplication with the attention maps and get through the 1 × 1 convolutional layer to obtain the self-attention maps. In fact, self-attention could be seen as a feature map multiplied by its own transposition, so that the pixels at any two positions have a direct relationship, which can help learn the dependency relationship between any two pixels to get global features. And the following formulas could explain the intact process.
The feature maps from the previous layer X are transformed into two features f, g to calculate the attention. And α j,i indicates the extent to which the model attends to the i th location when synthesizing the j th region. N denotes the number of feature locations of features from the previous layer and the output of the attention layer is O. In the above formulas, W q , W k , W h , W u are the learned weight matrices, which are implemented as 1 × 1 convolution layers.

C. ADVERSARIAL TRAINING
Adversarial training is an important way to enhance the robustness of deep neural networks. In the process of adversarial training, the samples will be mixed with some small perturbations, and then the neural network will adapt to this change, thus being robust to the adversarial samples. In the process of training our proposed PSRGAN, we use the adversarial training for generating more visually pleasing images instead of straightforward MSE loss between the input images and the output. The following formula shows the adversarial loss.
where G(.) means the generator of our proposed gan and D(.) denotes discriminator. And I HR denotes real-world highresolution images and I LR means the generated pseudo highresolution images. And the total loss of our network could be calculated in Formula 16. (16) where L is the total loss and L GAN denotes the adversarial loss while the L content means the total perceptual loss for the content. And we set ψ to 0.01 in this work

IV. EXPERIMENTS A. EXPERIMENT SETUP
The experiment was performed on a server with a 8 cores CPU which was accelerated by an NVIDIA RTX2080Ti GPU. NVIDIA RTX2080Ti has 4,352 CUDA cores and 11 GB memory and the core frequency is up to 1545 MHz. Pytorch was used as the framework to build the network and additional configuration parameters are listed in Table 1.

B. DATASETS
In this experiment, we first used DIV2K [30] dataset to train our proposed super-resolution model. We used bicubic VOLUME 8, 2020  interpolation to downsample the images of the DIV2K [30] dataset and also added additive gaussian noise to the lowresolution images to create clear and unclear image pairs. For pest classification, we used the 24 insect classes of the Xie1 [8] dataset and 37 insect classes of the Xie2 [9] dataset. 1200 pest images of Xie1 dataset were used to train the classification networks and 240 pest images to test. For the Xie2 dataset, we used 3618 pest images to train the classification networks and 667 pest images to test. We randomly rotated and flipped the images and used batch normalization for data augmentation and all images have been pre-processed for better training and testing results. Besides, we also made appropriate adjustments to the Xie1 dataset. Figure 9 and Figure 10 show the examples of the Xie1 dataset and the Xie2 dataset. Categories and pictures' number of the Xie1 dataset and the Xie2 dataset could be found in Table 2 and Table 3. For pest instance segmentation, we chose Diostrombus, Chauliops, and Callitettix as our research objects, which are all very common in the actual farmland environment and do great harm to agricultural production. They look similar to the living environment, which makes them hard to be found from the background. We built a dataset with 58 images for each class of pest as the instance segmentation samples. All pest images were bicubic downsampling to reduce the size to onequarter of their original size to simulate low pixel resolution and blurred insect morphology.

C. TRAINING DETAILS
In the preprocessing, we used bicubic interpolation to downsample the images and we added additive gaussian noise to the low-resolution image to create clear and unclear image pairs. we augmented the training data and reorganized the training set by making mirrors and rotate with four directions. All the input images were converted from RGB channels to YCbCr color space, and only the Y channel input is retained. Each time 32 images were selected to build a mini batch, the initial learning rate during training is 0.0001. The momentum was set to 0.9 and weight decay was set to 0.0001. Then we adopted the decay method of the learning rate, which reduced the learning rate by 10 times every 60 epochs and eventually trained 300 epochs using the RMSProp optimizer. What's more, we also pretrained our discriminative model using the trained VGG19 model to supply an initialization when training our PSRGAN to avoid undesired local optima. Figure 11 and Figure 12 respectively show the visual comparison between our proposed method and other state-ofthe-art PSNR-oriented methods including SRdenseNet [31], DSRNLP [7], VDSR [32], SESR [33], LapSRN [34] and PSNR and SSIM are also provided for reference. It could be observed from Figure 11 and Figure 12 that our proposed PSRGAN outperforms the previous state-of-the-art PSNRoriented methods in details and sharpness. Our proposed PSRGAN has significant advantages in reconstructing the overall profile and body details of insects. However, the previous state-of-the-art PSNR-oriented super-resolution methods tend to generate blurry results and introduce unpleasant artifacts. Besides, the generated textures of previous super-resolution methods are unnatural and contain unpleasant noise. Therefore, PSRGAN can reconstruct the detailed body of pests better than the previous state-of-the-art PSNRoriented super-resolution methods and improve classification accuracy and object instance segmentation recall rate more.

2) IMAGE CLASSIFICATION
We used several classification networks in our experiment including AlexNet [35], VGG-16 [20], Inception-v3 [36], ResNet-101 [25], Resnext50 [37], DenseNet-121 [26],   MobileNet V2 [38], ShuffleNet V2 [39]. Fine-tuning was used when training the classification networks and we retained most of the weights of the original networks and only trained the soft-max layer. We used 1200 pest images of 24-classes pest dataset to train the classification networks and 240 pest images were used to test. And for the 37-classes pest dataset, we chose 3618 pest images to train the classification networks and 667 pest images to test. What's more, cross-entropy was used as the loss function and we selected Adam as the optimizer. More training details of classification networks could be found in Table 4 and Table 5. Table 6 and Table 7 show the classification accuracy of raw images and images restored by SRdenseNet [31], DSRNLP [7], VDSR [32], SESR [33], LapSRN [34], RCAN [40], SAN [41] and our proposed PSRGAN respectively. And we can derive from Table 6 and Table 7 that if pest images were super-resolved, it could help enhance the classification accuracy. Moreover, our proposed PSR-GAN outperforms than the state-of-the-art PSNR-oriented VOLUME 8, 2020 super-resolution methods and improves classification accuracy more.

3) OBJECT INSTANCE SEGMENTATION RECALL RATE
We used the Mask RCNN [10] as the object instance segmentation model and train a segmentation system for the Diostrombus based on the resnet50 model. The training utilized the resnet50 model pretrained on the COCO dataset. Fine-tuning was carried out on the basis of this and we used Adam optimizer with a fixed learning rate of 0.001 to make the network converge. And after 20,000 iterations, the model basically converged. Figure 13, Figure 14 and Figure 15 respectively show the visualized pest instance segmentation results of Diostrombus, Chauliops, and Callitettix.   These figures indicate that low-resolution pest images and images upscaled by bicubic were difficult to be segmented in the system while the image restored by super-resolution methods was successfully segmented. Besides, the recognition confidence and the accuracy of the bounding box of the target pest of our proposed method is higher than other previous state-of-the-art PSNR-oriented super-resolution methods.
The experiment used 58 low-resolution images to test each class of pest. Table 8 shows the recall rate results for untreated pest images and after restored by Bicubic interpolation, SRdenseNet [31], DSRNLP [7], VDSR [32], SESR [33], LapSRN [34], RCAN [40], SAN [41] and our proposed PSRGAN respectively. Furthermore, we also calculated the improvement of recall rate between our proposed method and the baseline on the three test datasets, which could be found in Table 8.
The experiment results show that after reconstructing with our proposed PSRGAN, the recall rate increased by 182.89% compared to the original low-resolution and unclear pest images. And in comparison to other super-resolution methods, the recall rate also increased a lot. The experiment manifests that super-resolution enhanced small pixel scale object instance segmentation result was significantly improved by our proposed method.

4) SPEED
In this part, we researched on the running time of models. We reproduced SRdenseNet [31], DSRNLP [7], VDSR [32], SESR [33], LapSRN [34] with PyTorch on an NVIDIA RTX2080Ti GPU. We selected 20 pest images and 16 pest images from the Xie1 dataset and Xie2 dataset respectively. As shown in Figure 16 and Figure 17, PSRGAN could run at a high speed and achieves the best classification accuracy in MobileNet V2 [38] among the state-of-the-art methods.

V. DISCUSSION
PSRGAN is a GAN with residual and dense fusion and quadra-attention mechanisms. To take good advantage of both residual and dense fusion mechanism, we carry out both residual and dense connections in a single layer. In comparison with the residual network, the generator of PSRGAN could retain more information from previous layers, which enables our network to reserve contiguous memory. Compared to the dense network, PSRGAN can decrease the channel growth by half, which crucially reduces the number of network parameters, making the deeper structure trainable. We also use four attention mechanisms, namely spatial attention, channel attention, texture attention, and Self-attention. Spatial attention could help enhance the details in the reconstructed process and improve the effect of residual and dense fusion. And channel attention helps to learn autonomously VOLUME 8, 2020  to boost significant channels and suppress useless channels. Texture attention assigns attention based on guidance from edges, which help utilize edges as a global spatial attention mechanism for image reconstruction. Self-attention can obtain the global geometric features of images in one step by straightly calculating the relationship between any two pixels in the image, thereby the self-attention mechanism could study the dependencies between global features well. Additionally, CARAFE upsampling is used in the generator network of PSRGAN, which could have a larger receptive field during recombination and guide the recombination process according to the input characteristics. And CARAFE VOLUME 8, 2020   upsampling also assists our PSRGAN to gain good results in segmentation tasks.
According to experiment results, the classification accuracy and object instance segmentation recall rate for pest images transformed by super-resolution methods are more satisfying than low-resolution pest images obtained from real farms. This is because images upscaled by the superresolution method could convey much more information, such as high-frequency details of images, in comparison with low-resolution images. Experiment results on two pest datasets clearly reveal this phenomenon that using the lowresolution images results in lower accuracy. These results  indicate that the super-resolution methods successfully reconstructed the high-frequency details of pest images and promoted the identification of pests. We also compared PSRGAN to the previous state-of-the-art PSNR-oriented super-resolution methods and found that our proposed method performs better in improving classification accuracy and object instance segmentation recall rate.
Although PSRGAN has many advantages over previous super-resolution methods, there are still some limitations that must be overcome. Previous state-of-the-art PSNR-oriented super-resolution methods typically utilize average pixel positions to make images overly smooth but increasing PSNR results. However, the previous state-of-the-art PSNR-oriented super-resolution methods tend to generate blurry results and introduce unpleasant artifacts. Besides, the generated textures of previous super-resolution methods are unnatural and contain unpleasant noise, which has a bad effect on improving classification accuracy and object instance segmentation recall rate. PSRGAN does not use averaging, which facilitates resulting in better visual effects but reducing PSNR results in comparison with other PSNR-oriented super-resolution methods. And PSRGAN has significant advantages in reconstructing the overall profile and body details of insects. Therefore, PSRGAN can reconstruct the detailed body of pests better than the previous state-of-the-art PSNRoriented superresolution methods and improve classification accuracy and object instance segmentation recall rate more.

VI. CONCLUSION
In this work, we propose a novel image super-resolution method for agricultural pest images. To the best of our knowledge, our proposed method is the first to introduce GANs into agricultural pest image processing. We make use of both residual and dense connections to retain more information from previous layers to promote reserving contiguous memory, and reduce the number of network parameters significantly to make deeper structures trainable. Moreover, a quadra-attention mechanism provides a crucial performance increase. Spatial attention could boost enhancing the details in the reconstructed process. Channel attention VOLUME 8, 2020 can increase important channels and suppress useless channels. Texture attention facilitates assigning attention based on the texture features and taking advantage of textures as a global spatial attention mechanism during image reconstruction. Self-attention could study the dependencies between global features well. Besides, on the basis of our experimental results, PSRGAN outperforms state-of-the-art superresolution methods in terms of visual quality, classification, and instance segmentation performance. Based on effective attention mechanisms and residual and dense fusion, PSR-GAN could not only help improve classification accuracy and object instance segmentation recall rate but also decrease network parameters. And our proposed method also makes it possible to deploy fewer cameras in the farmland and save costs, which is pretty practical for real-world applications.