Edge Computing-Assisted DNN Image Recognition System With Progressive Image Retransmission

Deep learning-based image recognition systems have rapidly evolved. Due to the extensive processing load of the deep neural network (DNN) on graphic processing units (GPUs), the DNN model is deployed on the cloud server. Images or videos are forwarded from user terminals through the network to the server. In recent years, edge computing has gained popularity as a means of reducing the data traffic in the backbone network. However, the last one-mile access network between an edge server and user terminals will still be congested because a large amount of data such as video/image files must be forwarded. In particular, when computer vision applications such as image recognition are loaded in the edge network, a large amount of data is forwarded although the edge server always may not need the high-definition image. This paper proposes an image compression and progressive retransmission scheme for deep learning-based image recognition systems to reduce image data traffic and alleviate network congestion. The proposed method introduces an entropy-based threshold calculated from posterior probabilities from a deep learning model’s output layer. Entropy is an extremely effective metric because it can be used as an indicator independent of the number of classification labels in the DNN model. The thresholding can control the image retransmission and reduce traffic while maintaining image recognition accuracy. We implement the proposed scheme on the edge server and reveal the relationship between the data compression and the recognition accuracy through simulation evaluation. As a result, we indicate that an entropy-based threshold reduces the overall ambiguity of the accuracy of image recognition. Moreover, when a higher accuracy recognition model with more accuracy is combined with a retransmission scheme, it becomes the more effective.

low-latency services because the user terminals and edge servers terminate all processing without the support of cloud servers. Note that this paper assumes that the applications of edge computing networks also include networks in not only 5G/6G networks but also small areas such as shopping centers, stadiums, etc. In particular, when edge computing provides computer vision applications such as image recognition and object detection using DNN, data reduction forwarded through the network is a critical issue [8], [9] as shown in Fig. 1. Self-driving cars [10], [11], surveillance camera analysis [12], [13], and traffic navigation services at tourist attractions are all examples of computer vision applications. The use of edge computing will grow in tandem with the expansion of services.
Furthermore, the amount of data collected from users increased year after year. The network between the edge server and user terminals will be congested in the near future. As a result, our goal is to avoid decreasing throughput, increasing latency, and network congestion on the edge computing system that supports computer vision applications. In particular, this study focuses on DNN-based image recognition systems. Popularly, image compression such as JPEG encoding is a straightforward technique to reduce data traffic. However, when the image is compressed at a large rate, the image compression degrades the estimation accuracy of the DNN [14]. There is a trade-off relationship between image compression and the DNN estimation accuracy. Moreover, the proper compression rate depends on images. In addition, DNN models have significantly evolved, and new models have been proposed. Therefore, the compression method is expected to be independent of the DNN model. This paper proposes an edge-assisted image recognition method with image compression and progressive retransmission to overcome the above problem. The proposed method improves the trade-off between network efficiency and recognition accuracy in edge-assisted image recognition. The proposed edge estimates the recognition accuracy and requests the retransmission to the user terminal when the accuracy is estimated to be low. Then, the user terminal retransmits the higher-quality image. The proposed method can guarantee the estimation accuracy of image recognition.
The contribution of this paper is as follows; • Entropy can be employed as on an indicator of the retransmission decision. An entropy-oriented decision is independent of the number of labels of DNN because the top-k output from the last layer of DNN is not used.Thus, entropy is the generic indicator.
• The proposed edge-assisted image recognition system reduces the network traffic. Using progressive JPEG provides more traffic reduction, and we confirm the effectivity with the simulation analysis.
• The proposed retransmission scheme processes can be operated independently of the estimation model of DNNs, so the proposed scheme can be applied to any image recognition models based on DNNs. This paper evaluates the estimation accuracy and entropy of the image recognition by changing the image compression rate. We compare the proposed scheme with the image compression system without the retransmission in the entropy view point and the data size reduction.
This research is an expanded version of our previous work, which was published in IEEE VTC-Fall 2021 [15]. We extended the previous work in the following three aspects: • AlexNet is replaced in IEEE VTC-Fall version with two major image recognition models, ResNet and Efficient-Net, which are widely used in practical image recognition systems.
• The detail of the explanation and the discussion of the proposed system was extended, and the simulation data was obtained with the various parameters.
• The entropy was introduced as the retransmission decision indicator. In the following chapters, Section II introduces related works, and Section III describes the proposed method. The experimental evaluations are presented in Section IV. Finally, we conclude in Section V.

II. RELATED WORK
This section introduces the representative schemes for edge computing to conduct the DNN and the advantage of our proposed scheme. Several schemes have been reported on the edge computing system, with the DNN covering the IoT applications, to avoid congestion between the edge server and user terminals. For example, the image compression scheme is useful before transmitting the image to the edge server at the user terminals.
J. Ren et al. proposed an image compression scheme for object detection based on the region of interest (ROI) [16]. The ROI refers to the area that includes the target object to be recognized. The proposed scheme sets a lower compression rate for the background region. As part of a related study, we proposed multiple ROI transmission schemes and reduced the number of background images transmissions in a narrow bandwidth, and high packet loss [17]. Li et al. proposed an image compression scheme focusing on the difference in the required image quality of each application [18]. The proposed scheme adaptively selected the JPEG compression rate between the edge server and the user terminals based on the argent designed by reinforcement learning. The above works employed the traditional compression methods, and the DNN was used to update the compression rate. In addition, JPEG encoding may not be optimal for DNN-based image recognition because the compression is tailored to human vision. Therefore, a method has been proposed to reconfigure JPEG encoding for DNNs [19].
Besides image compression, a new method called split computing has been proposed for enabling network-efficient edge-assisted image recognition [20]. In split computing, the DNN model is split into a head network and tail network, deployed to a user terminal and edge server, respectively. The user terminal inputs its obtained image with the head network, and the output of the hidden layer is forwarded to the edge server. Then, the server processes the rest with the tail network. Split computing can reduce traffic and latency by introducing a bottleneck architecture to the head network. Matsubara et al. have studied an efficient way to train the head network to reduce network traffic without degrading the model performance [21]. Itahara et al. studied a model tuning method to improve the model robustness against compression and network-induced packet losses [22]. However, the user terminal must have enough computing power to handle the head network to apply this split computing. In contrast, the IoT devices such as network cameras and wearable sensors often do not have such computation power.
The followings are the key features of our proposed scheme: • When the estimation model was a classifier, the proposed scheme did not refer to or retrain the DNN model. The proposed scheme can be carried out even when the model is updated. The proposed scheme employs the entropy calculated by the posterior probability distribution output from the softmax of the DNN as the image retransmission decision indicator. Any estimation model can be used as long as the posterior probability distribution is obtained.
• The proposed scheme does not affect the related work introduced in this section. As a result, we can employ both schemes at the same time. The next section describes the principle of operation of the proposed scheme in detail.

III. PROPOSED EDGE-ASSISTED IMAGE RECOGNITION A. OVERVIEW
This section describes the concept of the proposed scheme. Fig. 2 (a)-(d) shows the candidate for the edge computing system for traffic reduction. Fig. 2 (a) is the configuration of the normal image recognition with edge computing. The user terminal has the original image and sends it to the edge server. Prior to recognizing the image with the DNN, the edge server uses a downsampling method to match the image size with the input size of the DNN. For example, the input size in ResNet, a popular model of the DNN, is 224 × 224 pixels. While Fig. 2 indicates only downsampling, the image is up-sampled when the image size is smaller than the input size. In Fig. 2 (b), the user terminal conducts the downsampling in advance. We anticipate a reduction in the image size. Furthermore, the system has no effect on image recognition accuracy. Meanwhile, the data size is equal to the total number of pixels multiplied by 24 bits. As a result, we anticipate greater traffic reduction when the user terminal performs JPEG encoding before transmission, as shown in Fig. 2 (c). However, when lossy compression is used, the recognition accuracy decreases. Fig. 2 (d) shows the proposed retransmission scheme. In the proposed scheme, users downsample and compress images before sending them to an edge server. The edge server uses the DNN to recognize the images. The edge server sends the Image-NAK retransmission request message when the estimation accuracy falls below the predefined threshold. The Image-NAK-received user terminals reset the compression rate to a lower value and resend the images to the edge server. When the edge server achieves sufficient accuracy, it transmits an acknowledgment message known as Image-ACK and ends the forwarding process. In addition, the edge server terminates the process when the number of image retransmissions reaches a certain threshold. It is worth noting that ACK and NAK messages of TCP connection are communicated in the network. The ACK and NAK are different messages of Image-ACK and Image-NAK.

B. IMAGE COMPRESSION FORMAT
We introduce two types of image compression format; The first is a baseline JPEG encoding standardized by ISO/IEC JTC 1/SC 29. The other is a progressive JPEG format. The progressive JPEG stores the binary data in order, starting with the image's lower resolution (frequency) components. In other words, the image can be opened even when the binary data is cut from the beginning to the middle. The shorter the binary data is cut, the coarser the image. Meanwhile, the VOLUME 10, 2022 standard JPEG format cannot open the image when the binary data is cut in the middle. The progressive JPEG embeds markers in the binary data. The compression rate is calculated at the marker position. For example, Fig. 3 shows the highly compressed image with a first marker position from the beginning of the binary data and the original image with all binary data. In Fig. 3, we assume that the maximum number of markers is n. Higher-frequency components are absent from the highly compressed image. The compression rate of the proposed scheme is controlled by referring to the markers. In this paper, the marker number is referred to as the compression step.

C. OPERATION OF USER TERMINALS 1) BASELINE JPEG CASE
In advance, user terminals perform downsampling and convert the image format into standard JPEG. The user terminal sets the quality in the range of 0-100% of the JPEG to compress the image. The level of quality is predetermined. When the user terminal receives the Image-NAK, it recompresses the image using the next designated compression rate. This phase is repeated until Image-ACK is received or the retransmission limit is reached.

2) PROGRESSIVE JPEG CASE
User terminals conduct the downsampling and convert the image format into progressive JPEG in advance. Algorithm 1 shows the operation of the user terminals in the proposed scheme. The image in the progressive JPEG format is converted to binary data D orig . The algorithm then reads the current compression step σ c . c is the compression step number. The initial compression step is assumed to σ i . The reception of Image-NAK suggests that the forwarding process has already been performed several times. Thus, the algorithm extracts the binary data D p at positions from the previous compression step σ c to the currently designated compression step σ f . When the Image-NAK has never been received, and this is the first time the image is being forwarded, the user terminals forward the binary data from the beginning of the data to the initial compression step σ i .
While the user terminal continues to send the compressed image to the edge server until the image recognition is complete, overlapped data are not forwarded due to the progressive transmission. From this perspective, the proposed scheme contributes to the traffic reduction. Algorithm 2 shows the operation of the edge server. The edge server combines the binary data D p just received with the data D has already received and composes the image y. The image y is input into the prediction model of the DNN. Note again that we use a pre-trained DNN model, get only the output of the DNN model, and calculate the posterior probability. That is, we need not retrain the DNN model. Here, the threshold was required be set for the retransmission. Entropy and top-k error are introduced as the decision indicator. The server calculates the entropy E(y) using the posterior probability p(x i |y) from the output layer.
where y is the input image, x i is the i-th label, L is the total number of labels. In the top-k case, the top-k error is expressed as, Up to k-th the posterior probability are summed. The prediction model is combined with the softmax layer to convert the logits into pseudoposterior probability. When E(y) is less than the E th threshold, the edge server requests retransmission to the user terminal. The entropy threshold is predetermined, and the maximum number of retransmissions is also limited. The next section confirms the entropy by varying the compression steps with the ResNet and the EfficientNet, which are the typical image recognition models.

IV. INVESTIGATION OF ENTROPY AND TOP-K ERROR PROPERTIES A. SETUP
The proposed scheme must set the following parameters in advance to retransmit the compressed images.
• JPEG quality and compression steps in initial transmission and retransmissions, • Top-k and entropy threshold for the decision of prediction accuracy. These parameters depend on the dataset and prediction model. This paper introduced ImageNet datasets [23]. Ima-geNet dataset includes 1,200,000 train images, 50,000 validation images, and 100,000 test images of 1,000 class. We used the test dataset for the experiment for setting the threshold of the retransmission and compression step in this section. The validation dataset was used for the experiment to evaluate the feasibility of the proposed scheme in the next section. The reason why we separated the dataset into test and validation was to avoid the overfitting. The applied prediction models of the DNN were, • ResNet-50, • EfficientNet-B7 The input sizes of ResNet-50 and EfficientNet-B7 are 224 × 224 pixels and 600 × 600 pixels, respectively. The prediction model is provided by Tensorflow library. We used the provided and pre-trained model. That is, we conducted no additional learning and no change in the layer structure. The proposed scheme conducts the downsampling and JPEG encoding as preliminary treatment. Thus, this section studied the relationship of the JPEG quality, compression step, and data size versus top-1, top-5, and entropy. In addition, NVIDIA RTX3090 was used for the machine specification, including 24-GB GPU, and AMD Ryzen 7 3700X. All of the simulations were carried out on this machine. Fig. 4 (a) shows normalized data size when changing the JPEG quality. The normalized data size is the total data size of the compressed images in the test dataset divided by that of the original images. The solid line and the color region display the average value and standard deviation, respectively. The data size changed nonlinearly against the quality. Fig. 4 (b)-(d) shows the top-1 output, the top-5 output, and the entropy when changing the JPEG quality. The solid and the dashed lines indicate average and median values, respectively. Both cases of ResNet-50 and EfficientNet-B7 changed the slopes by around 10−20%. We used the mean or median value as the threshold for the decision of the prediction accuracy. Table 1 summarizes the prediction threshold values. In addition, we used the values around inflection points as the threshold of the retransmission.

Fig. 4 (e)
shows the normalized average data size changed by the compression step of the progressive JPEG image. We converted the baseline JPEG format of the ImageNet test dataset into the progressive JPEG format and then set the JPEG quality to 95%. When the compression step is 2, 4, and 6, the data size was steeply changed. Fig. 4 (f)-(h) shows top-1 output, top-5 output, and entropy. Fig. 4 (e)-(h) were changed at the same inflection points. We set the threshold for the retransmission to mean or median value at compression step = 10. Table 2 summarizes the prediction threshold values.

V. EXPERIMENTAL EVALUATION A. SETUP
We evaluated the proposed scheme. This section used validation datasets of ImageNet. Table 3 shows the evaluation items. We prepared ten items. Indexes (1)-(4) used the baseline JPEG format, and Indexes (5)-(10) used the progressive JPEG format. We employed a two-pattern threshold of mean or median as shown in Tables 1 and 2 on the retransmission decision. In addition, we applied the top-1 error, the top-5  error, and the entropy as the criterion for the retransmission decision. The fourth, fifth, and sixth columns indicate the JPEG quality or the compression step. The fourth column shows the case of the initial transmission (indicated as ''Trans.''). In the initial transmission, images are forwarded using JPEG quality or step as shown in fourth column of Table 3. This paper set the maximum number of retransmissions to twice. The fifth and sixth columns mean the first and second retransmissions cases (indicated as ''1st retrans.'' and ''2nd retrans.''), respectively. For example, in the progressive JPEG case of the index (5), the user terminal forwards the binary data from zero to one step as the initial transmission. The binary data from two to four-step is forwarded in the first retransmission. Finally, in the second retransmission, the user terminal forwards the data from 5 to σ max .
We prepared the comparison data in the baseline JPEG format without retransmission. The JPEG quality was changed from 10% to 95%. We simulated the relationship between the forwarded data size and the top-1 error, the top-5 error, and the entropy. If having a smaller data size and a smaller error, the proposed schemes have an advantage over the baseline JPEG transmission without the retransmission. The second experiment evaluated the number of retransmissions for all indexes, as shown in Table 3. These evaluations employed ResNet and EfficientNet on the prediction model.
The proposed system is a novel topic for edge computing systems since it adds only a retransmission process that does not affect the DNN model. Thus, it is difficult to compare the proposed system with the related work. To fundamentally evaluate the effectivity, we compared it with the baseline JPEG. We used the published DNN model without the change, e.g., retraining or fine-tuning.
This verification assumed the ideal communication channel. In other words, the channel has no packet loss characteristics. When considering a practical communication channel, packet loss and forwarding latency affect the retransmission delay directly; however, we deal with this problem as a further study. In this paper, we reveal the prime potential of the proposed method.   is slightly effective in Fig. 5 (e). Meanwhile, the proposed scheme is effective in the entropy evaluation. In particular, all the points with the proposed scheme were mapped on the left side against the blue line when the entropy criterion decision was employed, as shown in Fig. 5 (i). Notably, the results with the proposed scheme were improved than the typical ResNet result indicated with the red line. The lower entropy means that it is possible to include the correct label near the top even when the correct label is out of top-5. Fig. 6 shows the results employed EfficientNet. The red line is the typical EfficientNet results without the additional compression. The error is more minor because the Efficient-Net accuracy is better than the ResNet. For this reason, the proposed scheme was effective in the cases of the top-1 error and the top-5 error, unlike the ResNet case. The entropy case was improved than the ResNet case. That is, the better the accuracy of the model, the more effective the proposed method is. Fig. 7 shows the breakdown of the number of retransmissions in all validation data. The horizontal axis is the index, as shown in Table 3. The vertical axis is the ratio of the number of retransmissions. For example, the index (1) in Fig. 7 (a) includes 40% of the validation datasets with the initial transmission, 20% of those with the first retransmission, and 40% of those with the second retransmissions. Overall, the baseline JPEG cases were more likely to be accepted without the retransmission, while the progressive JPEG cases were more likely to need the retransmission. In addition, the EfficientNet case contained around 40% to 50% of the second retransmission. This is because the EffficientNet accuracy is better than the ResNet. The ResNet had the larger number of the second retransmission. While the retransmission scheme is operated effectively, an increase in the retransmissions causes an increase in latency. Thus, the number of retransmissions should be limited when the proposed scheme is employed on mission-critical systems.

VI. LIMITATION OF PROPOSED SYSTEM
The proposed scheme aims to the system of image classification. It calculates the entropy from the classification results and guarantees the accuracy of the classification. The proposed scheme cannot be directly employed in object detection from an image, including multiple objects and segmentation. For the object detection [24], as a new method, we can reset ROI from the accuracy of the detected object and request to retransmit the image that includes the minimum required pixels. Meanwhile, an indicator by using the accuracy of object detection is needed. These problems are future works. For the segmentation, it may be easy to apply the proposed scheme to a segmentation method using belied map [25]. We will calculate the entropy from the conditional random field (CRF). The segmentation using attention [26] requires an image with multiple size. That is, a high resolution is needed. Pre-compression on the user terminal side, as in the proposed method, may not be suitable for the segmentation scheme. In the future work, we plan to extend the proposed method to not only these object detection and segmentation, but also video data.

VII. CONCLUSION
This paper proposed edge-assisted image recognition systems with progressive retransmission to reduce image data traffic and alleviate network congestion. We introduced a threshold based on entropy metric calculated from posterior probabilities from a deep learning model's output layer. We implemented the proposed scheme on the edge server. In this paper, we first calculated the practical threshold in the cased of top-1, top-5, and entropy when using ResNet and EfficientNet. We simulated the proposed image recognition system with baseline and progressive JPEG images using the calculated thresholds. The simulation results revealed the relationship between the data compression and the recognition accuracy. In the ResNet case, while top-1 and top-5 results were not exceeded the baseline compression method, the entropy result was drastically improved. Moreover, in the EfficientNet case, the proposed system indicated an improvement compared with the baseline method. This result implied that the higher the accuracy of the original DNN model, the better the proposed method also returns results. Further studies include employing the proposed scheme on more advanced computer vision applications such as object detection and experiments using commercially-supported edge systems.