Delay Constrained Hybrid Task Offloading of Internet of Vehicle: A Deep Reinforcement Learning Method

The rapid development of the Internet of Things (IoTs) has driven the progress of intelligent transportation systems (ITS), which provides basic elements, such as vehicles, traffic lights, cameras, roadside units (RSUs) and their interconnected 5G communications, to constitute the Internet of vehicles (IoVs). In the IoVs, an intelligent vehicle can not only share information with the infrastructures like RSUs by vehicle to infrastructure (V2I) communication but also with vehicles on the road through vehicle-to-vehicle (V2V) communications. We thus expect that vehicles can collaborate with other well-resourced and idling vehicles, making full use of the wasted resources. However, existing approaches cannot achieve this goal due to the increasing strict delay constraints and the dynamic characteristics of the IoVs tasks. To improve the utilization of resources and perform better resource management, in this paper, we propose a hybrid task offloading scheme (HyTOS) based on deep reinforcement learning (DRL), which achieves the vehicle-to-edge (V2E) and V2V offloading by jointly considers the delay constraints and resource demand. To perform optimal offloading decision-making, we introduce a dynamic decision-making method, namely deep Q networks (DQN). To verify the effectiveness of this approach, we choose three baseline offloading approaches (one game theory-based and two single-scenario approaches) and perform a series of simulation experiments. The simulation results demonstrate that, compared to the baseline offloading approaches, our approach can effectively reduce task delay and energy consumption, achieving high-efficiency resource management.

The associate editor coordinating the review of this manuscript and approving it for publication was Amjad Mehmood . vehicles, all of which require massive computation and stor- 28 age resources, as well as a strict latency requirement. For 29 instance, the latency of the autonomous vehicle steering is 30 hoped less than 100ms [1]. In addition, the real-time oper-31 ating system of an autonomous vehicle needs to process 32 about 1GB of data per second because there are hundreds 33 of sensors on vehicles generating a sea of data [2], which 34 may easily exhaust the vehicle's onboard resources to process 35 data. Although a more powerful processor such as GPU can 36 be installed in vehicles to support the increasing computation 37 demand, it will also incur higher energy consumption, which 38 Therefore, we propose a hybrid task offloading scheme • For urban scenarios, we propose a hybrid task 94 offloading scheme (HyTOS) of computation-and 95 resource-intensive IoVs tasks, which considers the col-96 laboration of V2E and V2V task offloading, making full 97 use of the scattered resources;

98
• We propose a dynamic offloading method based on 99 DQN to jointly perceive the time-varying characteristics 100 of vehicle tasks and resources distribution to optimize 101 the task delay and energy consumption;

102
• Conduct comparative experiments with the stat-of-the-103 art game theory-based and single-scenario offloading 104 approaches to evaluate the effectiveness and adaptability 105 of the proposed approach.

106
The remainder of this paper is organized as follows. 107 We summarize related works in Section II and present the 108 system model and mathematical description in Section III. 109 In Section IV, we first model the hybrid task offloading 110 problem as a Markov decision process (MDP) and propose a 111 DQN-based hybrid offloading method to address this prob-112 lem. Simulations are conducted in Section V to verify the 113 effectiveness of our proposed method by comparison with 114 benchmarks, finally, we summarize our paper in Section VI. 115 116 Recent advances in Internet of things (IoTs) have spawned 117 massive computing-intensive and delay-sensitive scenarios, 118 such as virtual reality (VR), autonomous vehicles, health-119 care IoTs, etc [7]. Computation offloading is considered a 120 promising technology to cope with these emerging trends. 121 The latest research has involved many new challenges, such 122 as joint low-latency, secure and reliable task offloading which 123 integrate software-defined networking(SDN) and blockchain 124 scheme to the healthcare IoTs [8], joint computation offload-125 ing and resource allocation in fog radio access networks 126 enabled IoTs [9], cache-assisted computation offloading in 127 MEC systems to avoid duplicates in offloading [10], and 128 balances multiple system utilities rather than simple objective 129 maximization [11]. More related works can be found in a 130 recent survey [7], and this paper focuses on task offloading 131 in the IoVs scenario.   Most studies still consider collaboration task offloading 187 schemes and take the vehicles into account. For example, [21] 188 proposed a collaborative offloading approach that integrates 189 CC, MEC, as well as vehicles. the cooperative optimization 190 problem is addressed based on game-theory methods. Still, 191 this work mainly focuses on CC and MEC collaboration, 192 which is different from our work. Zhang et al. [22] also 193 proposed the approximate task offloading that jointly con-194 siders the V2V and V2I collaborate scheme, and the edge 195 resource is effectively utilized to reduce the processing delay. 196 However, the above two works only considered single-target 197 optimization and didn't optimize the energy consumption.

198
On the contrary, some researchers merely consider the 199 V2V task offloading. Chen et al.
[6] studied the task offload-200 ing scenario which purely support by V2V collaboration, they 201 formulated the offloading process as a Min-Max problem 202 and solved by the Particle Swarm Optimization Algorithm. 203 However, purely considering V2V offloading cannot solve 204 the task offloading problem perfectly, due to the limited 205 vehicle resource and service interruption caused by move-206 ment. We believe that the task offloading of urban vehicular 207 scenarios must jointly consider the edge and idle vehicle, and 208 make full use of scattered resources.  Fig. 1 depicts the IoVs computing offloading of urban sce-212 nario that supports V2E and V2V collaboration. In this sce-213 nario, vehicles from four directions gather at an intersection 214 waiting for the traffic light. RSU or MEC server is deployed 215 near the vehicles and connected with a BS by a wired link. 216 The red device indicates the service vehicle (SV) that can 217 offer computing assistance. In contrast, yellow indicates the 218 overload task vehicle (TV) that needs to offload the task to 219 relieve its computational load. Both TVs and SVs connect to 220 the MEC server in real-time and send the task requirements 221 as well as available computing resource capacity to the MEC 222 server. Therefore, the MEC server will act as the global con-223 troller of the coverage area, combined with the AI algorithm 224 installed on the MEC server to automatically make intelligent 225 offloading decisions.

226
In this paper, we consider multiple vehicles and a MEC 227 server scenarios. All computing nodes that can provide ser-228 vices are indicated by N = {n 0 , n 1 , n 2 , . . . .n m }, where 229 n 0 represents the edge server (i.e., RSU), and we denote its 230 computing resource capacity as F 0 . In addition, there are 231 m vehicles with idle computing resources that can provide 232 computing assistance in the resources pool, namely SVs, 233 which are represented as n m , and SV n m with the computing 234 capacity of F m . The ith computing task generates by task 235 vehicle k at time slot t ∈ T that needs to be offloaded to the 236 RSU or SV, is represented by a tuple v i indicates the size of the calculation amount (that 238 is, the total CPU cycles) of the tasks v i k , b i (t) represents the 239 data size of the task v i k , i is the maximum delay tolerance 240 where B e represents the vehicle to edge system bandwidth and The RSU has the capacity of a relatively limited resource 262 compared to the cloud server, so the task congestion and 263 queuing are still inevitable when RSU is overloaded with 264 excess tasks offloading. The task quality may be degraded and 265 the QoS of tasks cannot be guaranteed. Therefore, offload-266 ing the task to SVs is necessary when RSU is overloaded. 267 Similarly, When the TV offloads the computing task to SVs 268 for processing, its data and calculation results are transmitted 269 through V2V communication, the channel transmission rate 270 is calculated by [23]: Similarly, B v represents the vehicle to vehicle system band-274 width and shared by |K | task vehicles, d The task execution brings a certain delay whether the task is 282 processed locally or offloaded to the RSU or SV, including 283 the date transmission delay and processing delay. When the 284 TV has sufficient computing resources, or the computation 285 pressure of the task is relatively low, the task is considered 286 to be processed locally to reduce data transmission delay. 287 In this case, the overall delay is equal to the computation 288 delay, which is determined by the total CPU cycles of the 289 task v i k and the available computing capacity of the TV. The 290 overall delay in local processing of task v i k can be calculated 291 by: where, λ ∈ [0, 1] indicates the proportion of tasks processed 294 locally, b i is the data size of task v i k , c i represents the CPU 295 cycles requirement to process a unit of data, and f i k is the CPU 296 frequency of TV allocated to process task v i k itself.

297
However, purely local processing can't guarantee the task 298 QoS, as for the computation-intensive or larger amounts 299 of tasks, and all tasks processes locally will bring higher 300 energy consumption compared with transmitting data to other 301 servers, which is not conducive to the battery life of the 302 vehicle. Therefore, the computing-intensive task needs to be 303 offloaded to RSU or other SVs with sufficient resources for 304 processing. The total delay includes the processing delay in 305 the RSU or SVs and the transmission delay of the data, which 306 is calculated by:

319
where f i 0,m denote the CPU frequency allocated to task v i k by 320 the RSU n 0 and SV n m , respectively, i.e., f i 0 and f i m .

321
Finally, we can denote the total delay of the task as follows: The energy consumption is also non-negligible during the 325 process of task execution and data transmission. Therefore, 326 the energy consumption optimization is crucial for a hybrid Similarly, the energy of task offloading is calculated by (9).

334
In addition to the basic computing energy consumption, 335 it produces extra energy consumption for data transmission:

338
where P 0,m represent the transceiver power between the TV 339 k and the RSU n 0 , and the transceiver power between the TV 340 and the SV n m , respectively, i.e., P 0 and P m .

341
The total energy consumption of task v i k is calculated by:  (11) and (12): The constraints are represented by (14)-(16), where the 366 computing resources and transmission power allocated to 367 tasks v i k cannot exceed their resource capacity and maximum 368 transmission power of the edge servers and service vehicles. 369 The delay constraint of task v i k is represented by (16), tasks 370 that exceed the delay constraint will be dropped. The RL method is different from traditional supervised 374 learning and unsupervised learning, it learns the opti-375 mal strategy through extensively exploring the environ-376 ment, and constantly improving strategies through rewards 377 and punishments, it's therefore particularly suitable for 378 autonomous decision-making problems. Recently, RL has 379 shown excellent performance and adaptability in controlling 380 and decision-making problems. This paper will realize the 381 hybrid offloading decision through the advanced RL method. 382 The task offloading process can be modeled as a MDP prob-383 lem [26], [27], which includes three basic elements: M = 384 {S(t), A(t), R(t)}, where S(t) and A(t) represents the state 385 space set and action space set, R(t) represents the reward 386 function, and defined as follows: The offloading task may be generated by multiple TVs, thus 389 we can define the state S(t) as follows:

397
The agent selects the best offloading server (i.e., RSU or 398 SVs) based on task demand and the resources capacity. The

415
The DRL method aims to find an optimal policy to maxi-  where P a s,s represents the state transition probability, r is the 446 immediate reward after taking an action a in state s, π(s|a) 447 represents the current policy, and γ represents the discount to 448 future rewards. 449 Q * (s, a) represents the optimal state-action value function 450 that indicates the optimal strategy a is taken under the state s. 451 Therefore, Q * (s, a) can be expressed as: where ϑ and θ is the parameters of the Q and Q, respectively.

463
The parameter ϑ of the Q is copied from Q at every fixed step.

464
The specific steps of the algorithm are shown in Algorithm 1.  In this section we discuss a quasi-static scenario involving 485 a single MEC server and multiple vehicles [29]. In urban 486 scenes, vehicles will inevitably gather, as discussed in 487 section I. In each time slot, assume that the vehicle's position 488 remains the same, and the channel remains stable since the 489 vehicle may still not move for a few seconds or moved for a 490 very short distance in the scenario mentioned above. The RSU 491 is connected to the base station and MEC server by a wire 492 link, and an RSU will cover the vehicles within a radius of , the rest of the vehicles can be 499 seen as the SVs with different resource capacities. These tasks 500 have various data sizes and computing requirements. The 501 resource capacity of RSU and SVs follows a uniform distri-502 bution. Moreover, the channel gain follows 127 + 30logd and 503 the total bandwidth of the system is 20 MHz. Additionally, 504 the Gaussian noise N 0 = −174dBm/Hz [29]. The detailed 505 simulation settings are summarized in Table 2.

506
We compare the proposed approach with the local process-507 ing approach, single scenario offloading approach, and the 508 advanced hybrid offloading approach to verify the advantages 509 and disadvantages of our proposed approach. The comparison 510 approaches are described as follows:   To make the algorithm converge quickly and get better learn-537 ing efficiency, we first verify the learning curves of differ-538 ent learning rates and discount factors. Generally, the RL 539 algorithm's converged speed will increase with the learning 540 rate. However, the convergence value cannot be guaranteed 541 to be optimal when the learning rate is too large, as the agent 542 may learn nothing, e.g., learning rate = 0.1 in Figure 3. to obtain the optimal solution within an acceptable time. 545 As shown in Figure 3, DQN has the best learning effect when 546 the learning rate = 0.01 with the default discount factor = 547 0.99, which means that DQN can learn more strategies.

548
Except for the learning rate, we also verify the influence 549 of different discount factors. The discount factor is set from 550 0.93 to 0.99, and learning rate = 0.01. The results are shown 551 in Figure 4. The cumulative reward is smallest when the 552 discount factor = 0.97. When the discount factor = 0.95, the 553 DQN has the best learning performance, and the convergence 554 reward is great than 130. In the subsequent simulations, we set 555 learning rate = 0.01, and discount factor = 0.95. The optimization objective of RL is to optimize the average 559 delay of all tasks, so the task quantity has an important sig-560 nificance to the algorithm performance evaluation. Figure 5 561 shows the tendency of task delay as the increasing of of 562 task quantity. From Figure 5 we can observe that the V2E 563 offloading can process a few tasks well, the task delay is 564 relatively smart when the task quantity is small. However, 565 with the increasing of tasks, the available resources for each 566 task become less, resulting in a rapid increase in the delay 567 of V2E. Similarly, when task quantity is small, the V2V 568 offloading strategy can ensure lower task delay, but there is 569 an obvious effect on the task delay when the task quantity 570 increases due to the limited computing capacity.

571
In contrast, our approach adopts a hybrid offloading 572 strategy to handle multiple tasks, which better use of dis-573 tributed resources, and considers the long-term gain of the 574 multi-task offloading decisions. Therefore, the advantages 575 of DQN-hybrid become obvious as the number of tasks 576 increases. The GT-hybrid offloading strategy outperforms 577 the single-scenario offloading approach (i.e., only consider 578 offload task to RSU or SV), while the proposed DQN-hybrid 579 outperforms all the baseline algorithms in terms of task delay. 580 Specifically, the task delay is reduced by 27%, 20%, 16%, and 581 14%, respectively. 582 Figure 6 shows the effect of the data volume on the 583 total task delay. Generally, the data transmission delay and 584 VOLUME 10, 2022 FIGURE 6. The impact of data volume on task delay. highest processing delay, followed by LC. The reason is that 610 V2V task offloading brings additional transmission delay 611 compared to LC. As resources increase, however, V2V task 612 delays are smaller than that of the LC, because the delay 613 of V2V offloading is greatly reduced by parallel processing. 614 As for the hybrid offloading algorithms i.e., GT-hybrid and 615 DQN-bird, they have lower task delays because they fully 616 consider different processing nodes. However, the DQN algo-617 rithm still has better performance than GT-hybrid, and the 618 delay is further reduced by about 10%. 619

620
With the proposal of energy saving and emission reduction, 621 electric vehicles have become an irresistible trend, elec-622 tric vehicles are more sensitive to the energy consumption. 623 Figures 9 -10 show the impact of the task quantity and data 624 volume on the vehicle's energy consumption. Since V2E 625 offloading transmits all the data to RSU to process, the 626 vehicle's energy consumption is only considered by the data 627 transmission, thus the V2E offloading has the smallest energy 628 consumption. On the contrary, V2V offloading incurs the 629 highest energy consumption because no matter where the task 630 offloads the energy consumption of vehicles is necessary, and 631 it's slightly higher than that of LC offloading because the 632 transmission energy consumption is avoided in LC offload-633 ing. While hybrid offloading approaches (i.e., GT-hybrid 634 and DQN-hybrid) have relatively low energy consumption. 635 Although sometimes our proposed DQN-hybrid methods are 636 higher than GT-hybrid, considering DQN-hybrid has a better 637 overall delay performance, it still has a better offloading 638 performance than GT-hybrid. In addition, the coefficient α 639 which controls the balance of two optimized goals was set 640 to 0.5 in these two experiments, different degrees of two 641 optimized goals will be obtained by adjusting the parameters. 642 3) SUCCESSFUL RATE 643 Figure 11 shows the results of the task successful rates of 644 the all algorithms under the conditions of low, medium, and 645 high resource capacity. As defined in Section III-A, a task 646 can complete before the maximum tolerate delay, we consider 647  the greatest advantage in continuous action space situations. 675 2) further exploring in generalization ability of the trained 676 DRL model, as well as the more realistic and complicated 677 scenarios are required, such as the dynamic task offloading 678 scenario that considers the vehicle movement, in which the 679 task must be completed before it switches from its corre-680 sponding RSU to another.

682
In this paper, we propose a hybrid task offloading scheme 683 (HyTOS) for the urban IoVs scenario, which jointly con-684 siders V2E and V2V offloading to minimize the task delay 685 and energy consumption while making full use of scattered 686 resources of vehicles. We further propose a Deep Q-network 687 (DQN)-based optimal offloading method to satisfy the com-688 puting requirements and ensure the delay constraints of the 689 task. The simulation results demonstrated that our approach 690 is significantly better than the single-scenario offloading 691 approaches, and has a better overall performance than the 692 advanced game-theory based hybrid offloading approach in 693 terms of task delay and successfully rate. Our approach has 694 good application prospects in delay-constrained and dynamic 695 IoVs scenarios. Future work is in progress to consider the 696 more dynamic task offloading scenario that consider the vehi-697 cle movement.