Vehicular Cloud Forming and Task Scheduling for Energy-Efficient Cooperative Computing

A vehicular cloud (VC) is a network of vehicles that perform cooperative computing through vehicle-to-vehicle (V2V) communication. Existing research on vehicular cloud computing (VCC) is mostly based on cloud servers or edge servers, not VCs. However, vehicles, by constructing a Vehicular Ad-Hoc Network (VANET), can perform applications requiring the large amount of computation cooperatively on their own without the help of edges or cloud servers. One of important issues for the VANET cooperative computing is how to handle the frequent topology change due to vehicle mobility. The unstable network topology limits the advantage of cooperative computing and even makes its operation stop sometimes. This paper proposes a cooperative computing method based on vehicle-to-vehicle (V2V) communication. For stable and energy-efficient cooperative computing, the proposed method considers the distance when selecting vehicles that it will cooperate with and delays task offloading back as far as possible. The proposed method outperforms previous static scheduling methods in terms of energy efficiency and network stability.


I. INTRODUCTION
Recently, vehicles have not only been connected to Internet via wireless communication, but have also become network nodes that can perform various applications in real-time [1]. Particularly, applications such as big data analysis and image processing using machine learning require high computing power. Vehicular Cloud Computing (VCC) might be a promising solution to run these kinds of application smoothly [2], [3], [4].
The VCC has a 3-tier architecture of Cloud-tier, Edge-tier, and Vehicle-tier. Each tier has different features, so each is suitable for different VCC applications. First, the Cloud-tier has the highest computing power and largest memory, but it may take the longest latency to reach because cloud servers The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Masini . are located far away. Thus, the Cloud-tier is good for applications that require large amounts of memory and computation but do not require short latency, such as infotainment services. Information services are information and entertainment services that are not related to driving safety, requiring a latency of 1 second and a bandwidth of 80 Mbps or more [5].
On the contrary, the Vehicle-tier consists of vehicles. Because a vehicle can receive data from other vehicles directly, this tier requires the shortest latency. Therefore, this tier is appropriate for applications requiring very short latency like collision warning of which the acceptable end-to-end delay is just 20-50 ms [5]. However, this tier is not good for performing computation-intensive applications due to the lack of computing power.
Finally, the Edge-tier can provide shorter latency than the Cloud-tier; and more computing power and memory than the Vehicle-tier. Thus, it is suitable for applications such as traffic information sharing and analysis services, which need a delay of 100-500 ms and a throughput of 10-45 Mbps [5].
Due to the moderate latency and computing power, some papers [6], [7] suggest that most of the computation work on vehicles is offloaded to edge servers. However, edge servers always have certain constraints on both computing power and memory unlike cloud servers, thus if too many tasks are given to edge servers, the service quality might be degraded and even the security might be endangered [6], [8].
In this situation, vehicles can relieve the burden of edge servers. Vehicles in the previous VCC model [9] were considered as just end-terminals sending requests to cloud or edge servers, but this is not all that a vehicle can do. If they construct a Vehicular Ad-hoc Network (VANET) and cooperatively run many applications by themselves through vehicleto-vehicle (V2V) communication, both the burden on edge servers and the network traffic load can be reduced. In addition, it is better for privacy since private information does not pass through external networks [10].
We suggest a platform that enables vehicles to efficiently and reliably perform cooperative computing in VANETs built through V2V communication. In the cooperative computing, tasks of a vehicle are offloaded to other vehicles. The vehicle that requests task offloading is called client vehicle (CV), and the vehicle that helps a CV is called worker vehicle (WV). The task offloading takes two steps: A CV finds some candidate WVs around itself and then distributes tasks to them according to the task execution schedule. The task offloading improves application running speed, but on the other hand, it requires additional cost to transfer data. Minimizing both time and energy for data transfer is a key performance metric for a vehicular cloud.
The latest vehicle network modules support both direct V2V communication using Dedicated Short Range Communication (DSRC) [11] or C-V2X mode4 [12]; and cellular connectivity using LTE C-V2X [13]. The cellular-based connection can cover a wider range, but is expensive, so it is more economical to transmit over V2V connections as much as possible. However, V2V connections may not be reliable because vehicles move fast. The distance between two vehicles in a VANET continuously changes due to their different speed. Whereas the cellular communication can reach several kilometers, the transmission range of the DSRC V2V communication is just from several hundred meters to 1 km, so vehicles can easily be out of the range from each other. In this case, an attempt can be made to resume the communication over the cellular network, but it increases cost and delay. Therefore, in order to improve VANET stability and minimize transmission cost, a CV needs to select WVs that are least likely to go out of communication range of the CV.
Another important factor in improving the efficiency of task offloading is task scheduling to find the best order to run tasks. The task scheduling determines the amount of data to be transferred, the associated transfer cost, and eventually the final completion time for all tasks. Most previous studies have focused only on minimization of this execution time. On the other hand, we consider not only execution time but also energy costs when designing our task scheduling method. The energy cost is particularly important for electric vehicles since it can affect their driving range. This paper proposes Stepwise Computation Offloading for Cost-efficient Cooperation (SCOCC) to form a vehicular cloud and find an energy-efficient task schedule. Main idea of SCOCC is to minimize the time interval between task offloading and task execution, and to consider the distance between a CV and WVs when choosing WVs. An application is decomposed into tasks, which are logical minimum units of work, being represented in a Directed Acyclic Graph (DAG) form by a CV. The tasks are then offloaded only to a fixed number of WVs in the order closest to the CV. Note here that not all tasks are offloaded at once, but rather some parts of them are offloaded sequentially to minimize the time difference between scheduling and task execution.
It is known as an NP-complete problem to find an optimal schedule for the tasks represented in a DAG [14], [15]. Thus, SCOCC is a heuristic method for performing reliable and energy-efficient task offloading. To evaluate our heuristic in practical networks, we adopted the Veins simulator [16]. The simulation results showed that SCOCC in a straight-road environment can reduce the probability of WVs going out of CV's range to just 1.7% from 15.8% of conventional static algorithms. Furthermore, the energy consumption for data transfer is reduced by 69.5%.
The contributions of this paper are as follows: • SCOCC builds vehicular clouds that maintain stable V2V connectivity despite vehicle mobility.
• Additional energy consumption for cooperative computing is minimized by reducing the average data transmission distance.
• Performances are evaluated with DAGs based on real applications in practical networks.
Before moving on to the next section, we would like to discuss why WV serves CV in V2V computation offloading. First, we could expect voluntary involvement from all vehicles to achieve common goals such as smooth autonomous driving service. In fact, the Internet service also depends on the voluntary participation of all network equipment. Second, a platform can be devised in which WVs receive a reward for their services. Actually, we are developing a blockchainbased, tamper-proof credit system. The nonfungible token (NFT) might also be used as the credit. However, this paper just assumes that all vehicles are willing to cooperate with others, since blockchain and NFT-based systems are beyond the scope of this paper.
The rest of the paper is organized as follows. Section II introduces related work. In Section III, we define the system model and analyze it with mathematical equations. Section IV describes our SCOCC heuristic algorithm, and Section V presents the simulation results, and Section VI VOLUME 11, 2023 discusses our analysis on the results and limitations of the study. Finally, Section VII concludes this paper.

II. RELATED WORK
Task offloading using cloud computing was initially proposed in a mobile environment [17] and then extended to VCC. A typical example of using VCC is the intelligent transportation system (ITS) [2], [18], [19], which provides services to reduce traffic accidents and facilitate traffic flow.
Mao et al. [7] presented computation offloading based on the Lyapunov function to minimize cost on edge servers using dynamic voltage and frequency scaling (DVFS) and power control. Assuming that each task was independent of each other, they decided whether to offload each task to edge server or not.
Sun et al. [20] proposed a scheduling scheme that runs on vehicular cloud or edge cloud. Their scheme firstly computes the cell dwell time of each vehicle based on the distance to a base station. This cell dwell time is used to form a vehicular cloud, and then tasks are scheduled to minimize the completion time. It was a genetic algorithm based scheduling method with low complexity considering task dependency for the optimization.
Unlike the above methods, Dai et al. [6] studied the case where multiple edge servers exist. For load balancing, tasks in a vehicle were split and offloaded to multiple edge servers.
Xu et al. [21] performed a multi-objective optimization, which reduces both execution time and energy consumption and prevents privacy conflict, by adopting non-dominated sorting genetic algorithm II (NSGA-II). However, they did not consider the dependency between tasks when finding routes from vehicles to suitable edge servers.
While most studies enable a client to select workers, Zhou et al. [22] suggested a method in which candidate workers choose an actual worker by voting. This voting system works since it is assumed that all workers in this paper are wired edge servers and all other servers' computing resource information is known to all. However, this system is not suitable for wireless vehicular cloud networks because too many packets must be exchanged between vehicles to share resource information in real time.
One of the most important issues in cooperative computing in VCCs is how to schedule the execution of tasks. List scheduling [23], [24], [25], [26], a typical DAG scheduling method, consists of two steps. First, priorities are calculated and assigned to tasks in order. For the priority calculation, the cumulative sum of node cost and edge cost from the starting point to each task is used in general [25], [27]. And in the second step, a node or device that will run each task is determined using its own algorithm.
Heterogeneous Earliest Finish Times (HEFT) [26], a representative list scheduling method, assigns tasks to devices that can complete them in the shortest amount of time, using the insertion-based scheduling. Although HEFT is a static algorithm that requires information about all devices and tasks in advance, it has long been considered a leader in DAG scheduling due to outstanding performance improvement.
The duplication based algorithms [28], [29], [30], [31] create copies of tasks on participating devices to minimize data transfer overhead. Significant performance gains can be expected if the resources of the devices are abundant, but we cannot say that it is efficient given the amount of resources it uses.
Hu et al. [31] proposed a platform in which dynamically formed vehicle clusters act as edge servers that perform computations for surrounding mobile devices. It is noteworthy that vehicles on this platform do not request offloading, but act as edge servers. An application to be offloaded is scheduled with the Greedy-based Task Scheduling Algorithm (GBTSA), which uses a greedy-based task copy technique, considering the inter-task dependency. This scheduling technique is to apply the task scheduling proposed by [32] to the vehicle cloud.
Existing studies mostly assumed offloading to edge servers and proposed scheduling techniques to minimize execution time. This paper attempts to minimize energy consumption and the probability of WVs out of CV's coverage by considering the distance between vehicles when selecting WVs and assigning tasks.

III. SYSTEM MODEL
All vehicles mentioned in this paper are electric vehicles equipped with GPS and transmission devices capable of both direct V2V communication and cellular communication, and they can perform tasks of other vehicles, which are requested by virtual machine (VM) technology. Fig. 1 shows the overall system flow. For computation offloading, a CV first requests information from other vehicles within the transmission radius via direct V2V communication, and then selects WVs based on the returned information. Detailed criteria to select WVs among the searched vehicles will be discussed later. An application app c of CV v c is decomposed into tasks, which are the smallest logical units that can no longer be decomposed, then those tasks are presented as nodes in a DAG. The problem of deciding which vehicle should perform each task is an important factor affecting the overall execution time, which is known as NP-complete [14], [15]. The transmission of data and the result for task offloading is attempted by default through direct V2V communication. However, sometimes a CV and a WV may be farther apart than the V2V communication range when offloaded works are completed, and in this case the result is transmitted through cellular networks. In this paper, app c is an application having a constraint that must be completed within the deadline. And CV v c requests the appropriate WV v j to execute task t i of app c so that it can be completed within the deadline, considering the computing power and data transfer time of the neighbor vehicles in one hop. The task scheduling (or task mapping) that determines which WV to be assigned t i is an important factor to determine the application completion time and energy consumption.

A. SYSTEM DESCRIPTION
A CV v c tries to offload tasks of an application app c to other vehicles. v c first searches for nearby vehicles, chooses WVs, and then requests them to process tasks of app c expressed in a DAG. To generate every 1 byte of output data for task t i , as many instructions as ipb i must be executed on average. So, if the output data of task t i is D o (t i ) bytes, a total of D o (t i ) × ipb i instructions are needed. Each vehicle has different computing power, which is expressed as the number of instructions per second, ips, and data transmission between vehicles is simply modeled as the amount of data transmitted per second, bytes/sec.
In order to collect information from surrounding vehicles, v c first broadcasts a message requesting information on them. Among the vehicles receiving this message, the vehicle willing to provide its computing resources returns information about its current location, current movement status, and computing resources that can be provided to v c . These resources are the remainder besides those reserved for their own work, as all vehicle prioritize their own tasks. Then v c has to decide which tasks to be assigned to which vehicles based on the received information. Since the completion time of the application and the amount of energy consumed vary depending on the scheduling that allocates tasks, it is necessary to find a scheduling that minimizes energy consumption on the condition that the application is completed before its deadline T . Data transmission overhead is very important in a vehicular network since the channel is wireless. Wireless channels require more time and energy to send data to other vehicles than wired lines. If the computation offloading cost, including the data transmission cost, exceeds its gain, there is no need to offload tasks.

B. PROBLEM DEFINITION
A DAG G = (T , E) is adopted to model tasks of an application app c . T is a set of nodes meaning tasks t i 's of app c and E is a set of directed edges representing the dependencies between tasks. A task without any incoming edge is called a source task. This paper deals with DAGs that have only one source task. But, this does not limit the practicality of the proposed method because DAGs with multiple source tasks can be easily converted to DAGs with one source task by adding one void task that has outgoing edges to all original source tasks. If there exists an edge from t i to t l , t i is called an immediate predecessor (i-predecessor) of t l . Conversely, t l is an immediate successor (i-successor) of t i . All predecessors must be completed before their successors.
After CV v c makes a set V w of WVs with some 1-hop neighbor vehicles that respond to its broadcast message, each task in T is assigned to a WV in V w . The binary variable x ij indicates that task t i is assigned to WV v j .
X is a set of x ij , which represents a task map assigning tasks in T to WVs in V w .
v j receives the assigned task via V2V communication and checks the DAG to determine when to start executing the task. v j can execute t i after receiving the results of the t i 's i-predecessors. When the execution of t i is completed, the resulting data of size D o (t i ) should be transmitted to vehicles performing t i 's i-successors. The time spent executing t i in v j , denoted by w ij , is the number of instructions to be processed for t i , D o (t i ) × ipb i , divided by the number of instructions that can be processed by v j every second, ips j .
And the expected finish time of t i can be calculated as the sum of the starting time and processing time in v j .
In order for v j to start t i , all i-predecessors of t i are completed and their results must be transmitted to v j . Also, among the tasks assigned to v j , all tasks with higher priority than t i should be completed before. The latest finish time of all t i 's i-predecessors, LFT i , is defined as the latest time when their operation and the transfer of the result to v j are completed.
where pred (t i ) is a set of t i 's i-predecessors, and b is the data transmission speed between vehicles. VOLUME 11, 2023 The execution order of tasks that are assigned to a WV is already represented in a DAG. A predecessor must be completed before a successor. The time when v j is available for t i , since all v j 's tasks with higher priorities than t i are completed, is denoted by AT ij .
The earliest start time when t i can run on v j , EST ij , is the latter of LFT i and AT ij .
The earliest finish time of t i in v j , EFT ij , is as follows: tr i is the time it takes to transfer the result of t i from v j to WVs having t i 's i-successors, which can be obtained as follows.
For t i to be executed in v j , all higher priority tasks must be completed, so t i 's starting time, ST i , should satisfy the following inequality.
In order for app c to be completed before T , all tasks must be completed before T .
The total energy E total consumed by v j for t i is the sum of the energy E exe to perform the task t i and the energy E tr to transmit its results to vehicles with t i ' i-successors.
In general, data transmission energy dominates total energy consumption in wireless networks. Likewise, in our vehicular clouds, changes in task mapping cause more change in data transfer energy than task execution energy, assuming processors in all vehicles consume similar power to execute an instruction. This assumption makes sense, as electric vehicles are likely to feature the best processors of their time, and as is the case with processors for PCs and laptops these days, these best processors are commonly produced by a few manufacturers. As a result, the total E exe is almost the same no matter where tasks are performed, thus this paper focuses on data transmission energy.
To calculate the data transmission energy E tr , we adopt the energy model in Liu's study [33]. In the paper, the energy for transferring M bytes between v j and v k is given as: Here, ε fs is the energy to transmit one byte in free space, and d jk is the distance between v j and v k . Thus, for all tasks in an application, E tr , the total energy needed to transmit result data from vehicles performing predecessor tasks to vehicles with successor tasks, can be computed as follows.
where succ (t i ) is a set of t i 's i-successors. By default, data and result of cooperative computing are transmitted through direct V2V communication. However, at the time a WV finishes its task operation, it is often out of the direct communication range of the vehicles that should receive the result. In this case, data is transmitted through cellular networks. If p out is the probability that WV v j with t i is outside the direct communication range at the time t i is completed and n s (t i ) is the number of vehicles with i-successors of t i , the expected data transmission cost can be obtained as follows. (15) where c ce is the cost of transmitting data over cellular networks and c di is the cost of transmitting data directly over V2V communication. The more transmissions over cellular networks, the higher the overall cost, because the cellular network costs more. When it comes to the cost, we consider the fee that should be paid to network operators in case we use the cellular network. However, aside from money, DSRC is more efficient than cellular communication in terms of energy and transmission time. The current DSRC and C-V2X are based on WiFi and LTE respectively, then the energy efficiency for WiFi is about 60% higher than that for LTE [34]. Also, the transmission times of DSRC and C-V2X are 0.4 ms and 1 ms, respectively. Therefore, we argue that it is better to use DSRC direct communication as much as possible than cellular communication for computation offloading.
As a result, we need to find a task mapping X that minimizes E[C tr ] under the conditions of (10) and (11). It can be defined as an optimization problem that has the following objective functions. V ← Searching for nearby vehicles 5: for t i ∈ T exe do 6: T ready ← t m where t m ∈ succ(t i ) 7: end for 8: T exe ← ∅ 9: V w ← CV and η vehicles nearest to the CV among v j ∈ V 10: Calculating b-levels for all t i ∈ T ready 11: Sorting tasks in T ready in the descending order of b-level 12: while T ready ̸ = ∅ do 13: Taking the first task t i in T ready 14: Calculating EFT ij for all v j ∈ V w 15: x ij ← 1 where j = arg min v j ∈V w EFT ij 16: T ready ← T ready − t i 17: end while 18: for x ij ∈ X do 19: Assigning end for 22: Waiting for a message saying any task t i has started 23: T exe ← t i 24: end while and location. Then, it assigns tasks to other vehicles. Communication cost, total execution time, and network stability vary depending on the schedule map X . Also, in a wireless communication system, it is very important to minimize the energy and time required for data transmission. Furthermore, in a vehicle environment, the distance between vehicles is easy to change due to rapid mobility. This makes it difficult to perform reliable cooperative computing and increases the cost, so a CV should select vehicles that are least likely to go out of the CV's communication range as WVs.
The proposed SCOCC builds a DAG with tasks of an application, and sequentially schedules the task execution considering the dependency between tasks. Initially, SCOCC in the CV starts the assignment process for the source task. Calculating the EFT of the source task when performed in each WV, and according to the EFTs, it selects a WV and assigns the source task. After the source task assignment, SCOCC only assigns tasks whose i-predecessors have started running. For this assignment, SCOCC newly searches for neighbor vehicles and transfers succeeding tasks to them. SCOCC repeats the search and allocation process whenever a new task starts running, which can be a drawback compared to previous methods. However, the searching processes overhead may be negligible since the offloading request and response messages are just a few hundred bytes, while offloaded data and task execution results are tens of megabytes. The energy consumption overhead is also low, since the energy needed to transmit is proportional to the data size according to (13).
Algorithm 1 describes the process of SCOCC. Line 1 resets T exe , the set of tasks that have started execution and Line 2 initializes T ready , the set of tasks that are ready to be executed, with the source task. Then the loop that determines the scheduling of tasks in T begins (Line 3). The client vehicle broadcasts a message to search for nearby vehicles in Line 4. The i-successors of tasks in T exe are stored in T ready (Lines 5-7). At first, no task is in T exe and the source task has already been put into T ready in Line 2. In Line 9, only η vehicles are selected as WVs in the order closest to the CV among the searched vehicles. The CV itself also participates in task processing. The value of η is determined later by experiments.
In order to determine task priorities, we compute the b-level of all tasks in T ready . The b-level of a task indicates how far away the task is from the completion time of the task that will be finally performed. This means that the execution of any task must begin at least its b-level before the application deadline T , in order to complete the whole work successfully. The b-level is calculated as follows.

b-level
where w i is the average execution time of t i . If succ(t i ) = ∅, which means t i is one of the final tasks without any successors, then b-level is set to just its execution time w i . The higher the b-level, the earlier the task must be started. Thus, each task is given priority in descending order of the b-level (Line 11). Line 12 starts a loop to assign each task in T ready to a vehicle in V w . In every round, the loop takes the first task in T ready , which has the highest b-level among T ready 's tasks. Then it calculates EFT for the task when executed in each v j in V w . The vehicle that shows the smallest EFT for the task is marked as the worker for the task in X (Lines 12-17). Then, according to X , each ready task is assigned to the associated worker (Lines [18][19][20][21]. After that, the CV waits for the arrival of a message saying that any assigned task has started running (Line 22). This message is sent by a WV that has started the execution of an assigned task. When receiving this message, the CV puts the task into T exe and performs Lines 4-23 again. This is repeated until the entire application is completed. While searching for nearby vehicles in Line 4, a message stating the start of a task may arrive. This task is also put into T exe .

B. NUMERICAL EXAMPLE
Figs. 2 and 3 illustrate an example of cooperative computing using SCOCC. A CV v c executes the application represented by the DAG in Fig. 2 with the help of nearby vehicles. Table 2 shows the execution time of each task in each WV. The b-levels calculated using the execution time are in Table 3.   In this example, tasks are scheduled with three rounds. In the first round, only the source task t 0 is assigned. Three vehicles v 1 , v 2 , and v 3 are found through the search mechanism, and the EFT of t 0 in each vehicle is calculated as shown in Table 4. Since v 1 has the smallest EFT, t 0 is assigned to v 1 .
Right after starting t 0 , v 1 sends v c a message about it. Then v c begins the second round to assign the successor tasks t 1 , t 2 , and t 3 . A new search process discovers the same v 1 , v 2 , and v 3 as before. Priority is given in the order t 1 , t 3 , t 2 according to their b-levels. The EFTs of t 1 , t 2 , t 3 in v 1 , v 2 , v 3 are given in Table 5. According to this result, t 1 , t 3 , t 2 are assigned to v 1 , v c , v 2 respectively, since these vehicles have the smallest EFT for each task. When t 1 in v 1 starts, the successor tasks t 4 and t 5 can be scheduled. In this third round, vehicles v 4 and v 5 are additionally discovered, and t 4 is given a higher priority than t 5 according to the b-level. The EFTs calculated for all WVs are shown in Table 6, so t 4 and t 5 are assigned to v 1 and v 5 , respectively.

V. PERFORMANCE EVALUATION
The performance of SCOCC was evaluated through simulation. We used Veins [16] to consider the practical situation of the road and the movement of vehicles. In this simulation, vehicles move at random speeds on a straight road in the Manhattan Street network, which is a typical topology for various VANET simulations. A new vehicle is generated on the road every 2 seconds and moves at random speed between 25 and 30 km/h. To take into account the change of the intervehicle distance, each vehicle speed is increased by 5 km/h after 5 seconds and then reduced back to the original speed after 5 seconds. All vehicles have arbitrary computing power, which follows a normal distribution with a mean of 400 and a standard deviation of 100. The data transmission speed between vehicles is set to 25 Mbps. For the link and physical layer, we rely on DSRC/IEEE 802.11p and cellular networks. They provide collision control using CSMA/CA and channel allocation. The data transmission speed between vehicles is set to 25 Mbps, since the DSRC data rate is known to be up to 27 Mbps. The performances are compared with Heterogeneous Earliest-Finish Time (HEFT) [26], Greedy-Based Task Scheduling Algorithm (GBTSA) [31], and a random scheduling. The target applications include a randomly generated DAG, a DAG of Gaussian Elimination, and a DAG of a Fast Fourier Transform.
We do not take congestion into consideration by making vehicles sequentially offload their data, because this simulation focuses on how well each algorithm distribute tasks to other vehicles. The congestion is not directly related to the decision what task should be given to a worker vehicle. Meanwhile, some literature such as [6] consider the case where vehicles simultaneously send packets to study load balancing of data offloading. Also, as with most previous studies on computation offloading, we do not consider the case where WVs go offline or do not return execution results on purpose because it is irrelevant for the following performance metrics.

A. PERFORMANCE METRICS 1) SPEEDUP
The schedule length means the time consumed to complete an application through cooperative computing. Speedup is the  sum of the running time of all tasks divided by the schedule length. This value is a performance improvement through computation offloading, and the higher this value, the better the algorithm.
2) DATA TRANSFER ENERGY The data transfer energy E tr , defined in (14), is the total amount of energy consumed for data transmission while an application is executed through computation offloading. Energy is one of the most important resources for electric vehicles, and the largest part of the energy additionally needed for computation offloading is data transmission energy. The lower the energy, the more likely the offloading algorithm will be adopted in practice.

3) p out
In SCOCC, vehicles use direct V2V communication by default for sending offloading tasks and the results. However, the distance between cooperating vehicles may be outside the direct communication range while executing assigned tasks, so the task results may not be delivered through V2V communication. In this case, the results should be transmitted over cellular networks instead. In (15), p out was defined as the probability that a vehicle is outside the direct communication range of other cooperating vehicles at the completion time of its assigned task. Cellular networks are generally more expensive than V2V communications, so the lower the p out , the lower the data transmission cost. We will obtain the expected value of p out through simulation.

B. SETTING THE VALUE OF η
The value of η, the number of WVs selected from searched ones, is important to the performance of SCOCC. Intuitively, the number of WVs required will be closely related to the number of tasks to be assigned, n t . Thus, we repeated the simulation with η = n t + k and k = 0, . . . , 6, to find the best η. Fig. 4 presents Speedup, data transfer energy, and p out against k. Speedup did not change against k, but the data VOLUME 11, 2023   transfer energy and p out were the best when k = 3. Based on this result, η was set to n t + 3 in all subsequent simulations.

C. RANDOMLY GENERATED DAGs
We randomly generated two types of DAGs. One is shallow and wide, and the other is deep and narrow. The first type has a depth of 3 and each level, except for the root, consists of 4 to 8 tasks at random (Fig. 5). On the other hand, the second type of DAG has a depth of 6 and each level, except for the root, consists of 3 or 4 tasks (Fig. 6). While the former is for applications with a high degree of parallelism in tasks, and the latter is for applications with long dependencies between tasks. The number of tasks in each level follows a uniform distribution and each task has n incoming edges from its predecessors following the probability    is Speedup for each algorithm according to the number of tasks. While GBTSA and HEFT achieved speed improvements 3.31 times and 2.89 times, respectively, SCOCC accelerated 2.74 times. The random scheduling showed the lowest improvement, which was just 2.59. The reason why the Speedup value of SCOCC is lower than that of GBTSA and HEFT is that SCOCC intentionally limits parallelism to reduce problems caused by fast mobility of vehicles. This reduces the possibility that the distance between vehicles becomes greater than the V2V communication range, thus reducing data transmission costs. The next two figures clearly show the effectiveness of this SCOCC's philosophy. Fig. 8 shows the energy used for data transmission according to the number of tasks in the DAG. While GBTSA and the random scheduling used 41.3 kJ and 28.4 kJ on average, HEFT consumed relatively smaller energy of 17.4 kJ for data transmission. On the other hand, SCOCC used just 5.4 kJ for data transmission, which is much smaller than the other algorithms. SCOCC tries to reduce the average distance between cooperating vehicles by taking only η vehicles as WVs in the order of proximity to CV, and this is definitely effective in reducing energy consumption. Fig. 9 compares p out , the probability of vehicles out of V2V communication range from each other. While p out values of GBTSA, random scheduling, and HEFT were 0.184, 0.181, and 0.11 respectively, it was only 0.003 when using SCOCC, which means WVs rarely went outside the transmission range. Speedup of SCOCC was 17.2% and 5.2% lower than that of GBTSA and HEFT, but it consumed only 13.1% and 31.2% of the energy that GBTSA and HEFT used for data transmission. This proves the efficiency of SCOCC relative to the amount of energy used.
Figs. 10∼12 illustrate the performances in the deep and narrow DAGs. Fig. 10 is Speedup according to the number of tasks. The Speedup values were 2.86, 2.47, 2.44, and 2.15 in the order of GBTSA, HEFT, SCOCC, and the random scheduling, which were not much different from one another. The performance gap between SCOCC and GBTSA or HEFT was reduced compared to the case of the shallow and wide DAGs, as the constraint on the number of WVs in SCOCC does not significantly harm parallelism when there   are a small number of tasks at one level. As for the energy consumption, GBTSA, the random scheduling, and HEFT consumed 56.9 kJ, 47.3 kJ, and 26.7 kJ respectively (Fig. 11). On the other hand, SCOCC used 12.4 kJ, which is only 21.8% compared to GBTSA.
Finally, Fig. 12 compares p out . Whereas p out were 0.190, 0.191, and 0.163 for GBTSA, random scheduling, and HEFT respectively, it was just 0.032 for SCOCC, which is 16.8% compared to GBTSA. For deep DAGs with a depth of 6, Speedup of SCOCC was 13.6% and 1.1% lower than that of GBTSA and HEFT, but SCOCC saved as much as 53.8% and 78.3% of energy compared to GBTSA and HEFT. This substantiates that as the DAG depth increases, SCOCC becomes more and more energy efficient than others without lagging in performance. VOLUME 11, 2023

D. GAUSSIAN ELIMINATION
In order to evaluate the performance with actual applications, we adopted two applications. The first is Gaussian elimination (GE). The GE is widely used to compute matrices for various machine learning algorithms, thus it will be likely used for various vehicular AI applications as well. Fig. 13 depicts the DAG of GE with a matrix size of 5. For a more fair comparison, we used the concept of communication-tocalculation ratio (ccr) [35], which is the communication cost divided by the computation cost on a specific machine. The ccr changes from 0.2 to 1.0 in units of 0.1. Fig. 14 shows Speedup against ccr for each algorithm. When the ccr was 0.2, Speedup of GBTSA was 1.82, higher than 1.73 of HEFT and 1.72 of SCOCC. The random scheduling has the lowest Speedup of 1.51. However, as the ccr increased, Speedup of GBTSA continuously degraded, and even lower than all the others when the ccr reached 0.6. This is because GBTSA seeks to improve performance by copying data to all succeeding tasks. The advantage of data duplication was offset by increased energy costs. Actually, HEFT and SCOCC showed the best Speedup after the ccr exceeded 0.3. As the ccr increased to 1.0, Speedup of all the methods decreased to 1.0, which means we cannot expect any speedup through computation offloading. Fig. 15 illustrates the energy consumed for data transmission against the ccr. The random scheduling consumed the most energy on average, but when the ccr was greater than 0.8, GBTSA used the most. As for HEFT, its data transfer energy continued to slightly decreased as the ccr increased because the number of offloaded operations was gradually reduced. Overall, SCOCC consumed the smallest amount of energy for data transmission. It was only about 36.8% of GBTSA, 35.5% of HEFT and about 20.6% of the random scheduling. Fig. 16 shows p out against the ccr. For all ccrs, p out of the random scheduling was the highest at 0.198, and SCOCC consistently achieved the lowest at 0.021. The averages of GBTSA and HEFT were around 0.10, about 5 times greater than that of SCOCC. While p out of GBTSA increased rapidly with the ccr, HEFT slightly reduced the probability as the ccr increased.

E. FAST FOURIER TRANSFORM
The second real application to test is Fast Fourier Transform (FFT), which is also widely used to evaluate scheduling    algorithms. Since FFT is a basic module for all wireless communication, it will be used in vehicular networks as well. FFT performs computation on N = 2 k points, then we set k = 2 as shown in Fig. 17.     18 shows Speedup for each algorithm against the ccr. When the ccr was 0.2, the Speedup values for SCOCC, GBTSA, and HEFT were 2.32, 2.31, and 2.27. Then when the ccr was greater than 0.4, the four algorithms showed almost the same performance. When the ccr was 1.0, Speedup approached 1. Therefore, there is no motivation to use task offloading. Fig. 19, the energy consumption for transmission, illustrates a similar pattern to that of GE. On average, a random scheduling consumed the most energy, and the energy consumption of GBTSA increased sharply when ccr became 0.6 or higher. The energy consumption of HEFT and SCOCC did not change much with the increase of ccr. SCOCC used 6.8 kJ on average for FFT, while it was just 3.8 kJ for GE. This is because the communication overhead increased with the increased number of edges.
Lastly, Fig. 20 depicts p out for the FFT application. Overall, p out of all the other algorithms was high around 0.15, but SCOCC achieved very low probabilities of less than 0.04 for all ccr.

VI. DISCUSSION
We performed simulations on DAGs of randomly generated DAGs, GE, and FFT to compare the proposed SCOCC with HEFT, GBTSA, and a random scheduling. The simulation results show that SCOCC can save data transfer energy by 70% compared to the other algorithms and reduce the proportion of cooperating vehicles outside the V2V communication range from each other by as much as 84%, although its speedup improvement is 9% lower than the others. This result proves the outstanding energy efficiency of SCOCC and the high stability of VANETs built by SCOCC.
On the other hand, this study has clear limitations. The larger the ccr, the lower the performance of cooperative computing. Additionally, as vehicle speed increases and direction of movement becomes diversified, the stability of VANET clouds deteriorates. Also, SCOCC cannot maximize its advantages for DAGs with shallow depths. Therefore, SCOCC may not be a good choice if the execution speed of shallow DAG applications is the top priority. Otherwise, SCOCC is energy efficient and reliable with sufficient performance improvements for most applications. The features of each algorithm are summarized in Table 8.
Lastly, SCOCC does not provide a specific MAC protocol that fits best for its operation. The cellular and DSRC/IEEE 802.11p MAC themselves provide error recovery using FEC and ACK mechanisms, but the 802.11p MAC cannot prevent the hidden terminal problem, which can deteriorate the proposed method performance. We will extend the proposed method by adding a MAC like Enhanced Distributed Channel Access (EDCA) to reduce the offloading failure probability in future work. Also, the advanced SCOCC will be tested on a variety of road topologies in addition to the Manhattan Street network. VOLUME 11, 2023

VII. CONCLUSION
This paper proposed SCOCC to establish a stable VANET cloud and perform an energy-efficient computation offloading between vehicles. To achieve the goal, SCOCC i) delays task assignment as much as possible and ii) considers the distance from a client vehicle when selecting worker vehicles. The reduction of the time interval between task assignment and its execution enhanced the VANET stability, and the reduction of average inter-vehicle distance was helpful for both the stability and energy consumption.
Unlike the static algorithms such as HEFT and GBTSA, SCOCC does not try to optimize execution speed of entire tasks at once, so it does not show the better performance in terms of execution time. However, SCOCC achieves essential stability in a fast moving vehicle environment and minimization of energy consumed for wireless communication.
MINYEONG GONG received the B.S. degree in electrical and computer engineering from the University of Seoul, South Korea, in 2016, where he is currently pursuing the Ph.D. degree in electrical and computer engineering. His research interests include vehicle networks, the IoT, and cloud computing.
YOUNGHWAN YOO (Member, IEEE) received the B.S. and M.S. degrees in computer engineering and the Ph.D. degree in electrical engineering and computer science from Seoul National University, Seoul, South Korea, in 1996Korea, in , 1998 Dr. Yoo served as a program committee member or a reviewer for a variety of journals and conferences.
SANGHYUN AHN (Member, IEEE) received the B.S. and M.S. degrees in computer engineering from Seoul National University, Seoul, South Korea, in 1986 and 1988, respectively, and the Ph.D. degree in computer science from the University of Minnesota, in 1993.
She is currently a Professor with the Department of Computer Science and Engineering, University of Seoul, Seoul. Her research interests include wireless ad hoc, sensor and vehicular networks, internet protocols, routing protocols, and the IoT. VOLUME 11, 2023