Decentralized Convex Optimization for Joint Task Offloading and Resource Allocation of Vehicular Edge Computing Systems

Vehicular Edge Computing (VEC) systems exploit resources on both vehicles and Roadside Units (RSUs) to provide services for real-time vehicular applications that cannot be completed in the vehicles alone. Two types of decisions are critical for VEC: one is for task offloading to migrate vehicular tasks to suitable RSUs, and the other is for resource allocation at the RSUs to provide the optimal amount of computational resource to the migrated tasks under constraints on response time and energy consumption. Most of the published optimization-based methods determine the optimal solutions of the two types of decisions jointly within one optimization problem at RSUs, but the complexity of solving the optimization problem is extraordinary, because the problem is not convex and has discrete variables. Meanwhile, the nature of centralized solutions requires extra information exchange between vehicles and RSUs, which is challenged by the additional communication delay and security issues. The contribution of this paper is to decompose the joint optimization problem into two decoupled subproblems: task offloading and resource allocation. Both subproblems are reformulated for efficient solutions. The resource allocation problem is simplified by dual decomposition and can be solved at vehicles in a decentralized way. The task offloading problem is transformed from a discrete problem to a continuous convex one by a probability-based solution. Our new method efficiently achieves a near-optimal solution through decentralized optimizations, and the error bound between the solution and the true optimum is analyzed. Simulation results demonstrate the advantage of the proposed approach.


I. INTRODUCTION
T HE substantial increase in the number of connected vehicles and the latest advances in autonomous driving lead to the emergence of various services and applications in the intelligent transportation system, such as online path planning, real data playback, localization, and perception [1]. These highcomplexity applications demand extraordinary computation capacities, but resource-constrained vehicles may not be capable of serving the ever-increasing computational needs of new applications within their latency deadlines [2]. Driven by the evolution of wireless communication, Vehicular Edge Computing (VEC) systems, supported by Mobile Edge Computing (MEC) 1 [4], [5], are recognized as a promising paradigm in the development of vehicular networks. Owing to the close proximity to vehicles [6], VEC systems can provide computing services in Roadside Units (RSUs) with reduced end-to-end transmission delays.
To facilitate VEC in accelerating task completion and saving energy, the development of a vehicle computation offloading policy is crucial. Existing studies focus on the design of optimal offloading strategies to meet different performance requirements, such as low latency [7], high energy efficiency [8], and load balancing [6]. There are mainly two challenges for task offloading: the offloading decision, which determines where the task is to be executed, and the resource allocation, which characterizes how much computation and communication resources are allocated to the tasks. The formulation of this problem arises naturally in an intricate structure, which makes it challenging to obtain an optimal solution, especially when the number of variables grows exponentially in a high-dimensional scenario.
To address this challenge, researchers have proposed many different solution approaches lately. The problem is formulated as a constrained optimization problem in [9], [10], [11], [12], [13] to minimize the offloading delay. To investigate a holistic offloading solution in a multi-server MEC-assisted network, Tran et al. [10] decompose the original problem and find the resource allocation solution by quasi-convex optimization techniques, where the task offloading problem is tackled by the proposed heuristic approach. Tang et al. [12] address the energyconstrained delay minimization problem, which is solved using the decision tree and dynamic programming. The total network delay is emphasized in [13], where a Lyapunov optimization is used for the development of an online multi-decision making algorithm. Besides the above works that apply optimization techniques, studies on vehicular offloading policy also exploit Reinforcement Learning (RL) algorithms [14], [15], [16], [17], [18]. For instance, Qi et al. [17] consider the data dependency in multiple tasks and apply the deep RL algorithm to find the long-term optimal offloading policy. To minimize the processing delay in VEC networks, Guo et al. [18] design an intelligent task offloading scheme based on deep Q learning, which is a centralized approach that requires all vehicles' information to be collected at a central RSU.
In the aforementioned studies, the vehicle plays the role only of a service client, where the offloading strategy is calculated and determined at the edge device. When the problem is both formulated and solved in a centralized way, the vehicles must send task parameters to the RSU and wait for the offloading decisions returned from the RSU. The potential issues, in terms of additional communication delay, increased computing complexity, and security issues caused by the information exchange, are not fully addressed yet. As an alternative, a decentralized offloading policy is desired.
Recently, a few studies have also focused on exploiting the benefits of decentralized computation offloading in VEC systems [19], [20], [21], [22], [23]. An adaptive learning-based task offloading algorithm is proposed in [19] based on multi-armed bandit theory. It works in a distributed manner and minimizes the average offloading delay. The vehicle-to-vehicle communication is considered in [20], where a decentralized resource allocation mechanism is proposed based on deep RL, and the global information is not required for each vehicle to make its decisions. The consensus ADMM-based energy-efficient resource allocation algorithm is proposed in [21], where the formulated joint problem is decomposed into a set of subproblems and solved in parallel. Jošilo et al. [22] develop a game-theoretical model and allow users to make offloading decisions autonomously. Liu et al. [23] design a user-centric control policy to optimize both delay and energy consumption by formulating the problem as a fully decentralized multi-agent Markov decision process.
Most of the existing decentralized solutions rely on a trained deep RL model or reach global coordination through an iterative way, which still requires high computational power or synchronous updates among vehicles when implemented in a real-time offloading application. In the context of vehicular offloading, the high computation power requirements call for effective energy management at the RSUs, which is an issue not emphasized widely. Most existing studies concentrate on reducing energy consumption at the vehicle but ignore the corresponding analysis at the edge [24]. Some recent studies [25], [26], [27] have investigated this topic by considering the optimization of both vehicles and RSUs. They optimize the energy consumption and the execution time of the holistic vehicular services, including vehicles, RSUs, and base stations. These works justify the need for optimizing both the vehicles and RSUs. In addition, because of the high mobility of vehicles and limited coverage of the RSUs, the vehicle cannot select offloading destinations arbitrarily. To address these problems, we develop a decentralized convex optimization approach that decomposes a holistic Mixed-integer Nonlinear Problem (MINLP) into a hierarchy of convex optimization problems. The decomposition is obtained by the dual decomposition and the probability-based offloading policy.
The main contributions are summarized as follows: r A decentralized task offloading and resource allocation problem in a multi-server VEC system is formulated as an optimization problem. The optimization criteria include the total latency of all tasks and the energy consumption in both vehicles and RSUs.
r A hierarchical decomposition approach is designed to break down the original MINLP into a group of convex subproblems for optimal resource allocation. These subproblems have low complexity and can be efficiently solved at the vehicle side. The vehicles only receive broadcasting messages from RSUs, which enhances user privacy and reduces information exchange delay.
r A convexification procedure is presented to transform the discrete optimization problem for task offloading into a continuous convex one. The integer design variables of deterministic task offloading targets are replaced by probabilities for offloading targets.
r To examine the application of the proposed decomposition approach, we analyze two common RSU deployment scenarios. The task offloading and resource allocation methods are studied and evaluated in both scenarios. The rest of this paper is organized as follows. In Section II, the system model is presented with the formulation of the joint task offloading and resource allocation problem. Section III describes the hierarchical decomposition approach and solves the resource allocation problem in a single RSU scenario. We extend the solution to the multi-RSUs scenario and investigate the task offloading problem for load forecast coordination in Section IV. The numerical performance evaluation of the proposed methods is given in Section V, and Section VI concludes this paper.

II. SYSTEM MODEL
In this section, we present the system model of the VEC network. After that, a vehicle-edge task offloading and resource allocation problem is formulated.

A. Vehicular Computing System
As illustrated in Fig. 1, we consider a highway scenario with m RSUs and n vehicles. Let M = {1, 2, . . . , m} denote the index set of RSUs and N = {1, 2, . . . , n} denote the index set of vehicles. We use j ∈ M as the index of the RSU and i ∈ N as the index of the vehicle. The edge computing network consists of RSUs, and each RSU contains an MEC server. The server provides wireless radio access and computation resources to the vehicles. The computing tasks of the vehicles can be offloaded to an RSU so that the driving performance can be improved by reducing the task execution time. The notation used in this paper is summarized in Table I.
We consider that vehicles have to solve periodic tasks with the period ΔT . At every time step t, every vehicle i ∈ N generates a task. In the offloading system, we assume that the period ΔT is the longest acceptable delay to finish a task. The computing task from vehicle-i can be characterized by [L i , C i ], where L i is the length of interactive transmission data (in bits) between vehicles and RSUs. We consider that the transmission data consist mainly of the computation instruction, and thus L i is viewed as a constant. C i represents the amount of computation resources (in CPU cycles) required to finish the task with the mean value of C i [28].
In this study, a task is viewed as atomic and cannot be split [10]. Hence, a task cannot be computed partially on different computation nodes. Particularly, we focus on the type of computational resource-demanding tasks which have to be accomplished at the edge side [29], such as high-complexity motion planning and control modules. Further potential applications for offloading are, for example, given by cooperative driving automation scenarios [30]. Thus, we only consider the pure offloading case where a task is entirely executed on a single RSU.

B. Traffic Scenario Model
The traffic system in our study is a bidirectional road network. To describe the coverage of communication, we denote by T ⊆ R 2 the set of two-dimensional geographical locations of every drivable position in the system, and by p i ∈ T the position of vehicle-i. Locations of RSUs are denoted by r j . The Euclidean distance between two positions p 1 and p 2 is estimated by the square norm p 1 −p 2 , p 1 , p 2 ∈ T ∪ {r 1 , . . . , r m }.
We define a circular area in the traffic system as U , which can be viewed as the set of all drivable positions within the area. Suppose the coverage radius of RSU-j is d j , U (r j , d j ) {p ∈ T | r j − p ≤ d j }, which represents the set of positions on the road within the communication range of RSU-j. In this study, RSU-j is available for receiving a task from a vehicle only when its position is within U (r j , d j ). For simplicity, we consider that the context transfer among RSUs is well managed by proactive service migration [31]. We also assume that the vehicles' movements and the offloading workload in an area can be accurately estimated. Therefore, the context transfer delay across RSUs is considered to be known. The context transfer delay is assumed to be proportional to the task computation workload C i and is estimated to be C i , for some > 0 [12]. Also, we define the expected offloading demand E( {i|p i ∈U (r j ,d j )} C i ) for the area. Note that the prediction method and task migration design are not the focus of this work but will be investigated in the future. State-of-the-art methods are, for instance, premigration [32] and mobility-based services migration prediction (MSMP) [33].
We assume that each drivable position on the road is covered by at least one RSU. If the distance between two RSUs is smaller than the sum of their coverage radii, the corresponding service areas will overlap. When solving the RSU deployment problem in similar scenarios, authors in [34] prove that, in terms of profit maximization, a non-overlapping solution performs no worse than an overlapping one. However, the non-overlapping solution does not consider the demand disparity under different traffic densities. Due to the heterogeneity of the RSU capacity, it may fail to handle the overload situation. To examine our approach in a more complex scenario with high workload, we thus also consider a deployment architecture with overlapped areas. More details of the scenario are given later in Section IV.

C. Computation Model
As mentioned in Section II-A, this study focuses on periodic tasks that cannot be accomplished onboard the vehicles, and should thus be offloaded to an RSU for execution. We consider that the computation capacity of the RSU is provided by a CPU supporting dynamic voltage and frequency scaling (DVFS) [19]. DVFS is a technique to adjust the frequency of a CPU for balancing the energy consumption and task execution time. In practice, the available frequency is restricted to a finite set of values F = {f j1 , f j2 , . . . , f jN }, referred to as the available clock-frequency vector for RSU-j. Similar to [35], [36], [37], we assume that the difference between consecutive frequencies is so small that the frequency can be approximately treated as a continuous variable. The solution can be viewed as a reference to the performance upper bound in realistic offloading policies. We denote by f j the maximum nominal CPU frequency of RSU-j. Let u ij ∈ (0, 1] be the scaling ratio of f j determined for vehicle-i to finish the task. Thus, the execution time is where the superscript o denotes the computation part in the sequel. An important aspect of computation is the energy consumption of task execution. We focus on the dynamic energy consumption of task execution [38]. Other energy consumption on the RSU, such as energy consumption at idle mode and cooling system, is assumed constant. The power consumption of a CPU is modeled as θ j f j 3 per CPU cycle, where θ j is the energy consumption coefficient depending on the chip architecture [8]. Then, the energy consumption of the task-i executed at RSU-j is

D. Communication Model
The vehicles can transmit data to the RSUs using wireless communications. We denote by p i the maximal transmission power of vehicle, and by r ij p i the actual transmission power used by vehicle-i for transmitting a task to RSU-j, where r ij ∈ (0, 1] is a decision variable. The decision variable r ij allows the vehicle to trade off between energy consumption and transmission delay [23]. Let h ij = g( p i − r j ) be the channel gain from vehicle-i to RSU-j. The channel gain includes path loss and fading, and is a function of the distance p i − r j between the RSU-j and the vehicle-i [9]. We consider that Orthogonal Frequency Division Multiple Access (OFDMA) is used [22]. If a vehicle tries to communicate with an RSU, it occupies a fixed bandwidth B p . If the system bandwidth is B j , then at most N j = B j B p vehicles can offload to RSU-j. Given noise power δ j , the transmission rate can be expressed as We consider that the transmission delay from the RSU to the vehicle is negligible since the result of the computation is typically much smaller than the input data. Thus, given L i , the transmission delay can be calculated as where the superscript c denotes the communication part in the sequel. If a context transfer occurs, the extra delay is C i . The energy consumption of data transmission can be expressed based on the transmit power and the transmission delay as We do not account for the communication energy consumption on RSUs.

E. Problem Formulation
Our objective is to minimize the total task response times and communication energy at vehicles while satisfying the computation energy constraints at the RSUs. We focus on a single time step, as a building block for solving the finite horizon version of the problem, which we leave as the subject of future work. In what follows, we refer to response time as the total task completion time, including computation, communication, and context transfer delay. Note that the response time is used for simplicity sake, which does not rigorously follow the definition in the real-time scheduling theory [39], since the task preemption and blocking are omitted in this study.
We denote by φ the offloading decision variables: To estimate the possible context transfer delay, we consider the decision matrix φ φ φ at the next time step. Note that φ φ φ may not reflect the actual offloading decision, but is only used to check the offloading availability at the next time step based on the vehicle position. The fundamental assumption is that if vehicle-i offloads a task to RSU-j at the current step-t and is still within U (r j , d j ) at the next time step, then the vehicle maintains the same offload target, i.e., Under the latter situation, the context transfer occurs, and the additional communication delay must be counted. Since the road network is fully covered by all RSUs, there must be some j = j such that r j − p i (t + ΔT ) ≤ d j and φ ij = 1. The exact value of j is irrelevant for the optimization problem, because we count the context transfer delay for vehicle-i as . Thus, we can formulate the joint problem of task placement and resource allocation with the objective of minimizing the weighted sum of overall task response times and communication energy consumed at vehicles. η i in the objective function is the weighting parameter used for the balance between delay and energy saving. The problem is defined in (P0): In the set of constraints, (6b) states the total computation utilization on each RSU, where U j represents the upper bounds on CPU utilization. Constraint (6c) regulates the energy consumption at the RSU, where E o j denotes the energy consumption limit at RSUs. (6d) guarantees the time constraint satisfaction while (6e) determines the ranges of the control variables. For each task, a vehicle can only have one RSU to offload. Thus, the sum of φ ij over all RSUs should be 1, which is reflected in (6f) ∼ (6h). (6i) defines the upper limit for offloading, and (6j) examines the RSU service availability due to vehicle mobility.

III. HIERARCHICAL DECOMPOSITION AND RESOURCE ALLOCATION PROBLEM SOLVING
Problem (P0) is nonlinear and has the binary variables φ ij . It is an MINLP and generally difficult to solve. To address this issue, we propose a hierarchical decomposition approach in this section, which achieves a close-to-optimal solution that can be computed in a decentralized way. Fig. 2 gives an overview of the hierarchical decomposition approach. The essential idea is to decompose the original problem into several decoupled subproblems, each of which is represented as a block in the figure. They have lower complexities and can be solved efficiently. All methods to perform the decomposition in this study are labeled in grey. The signals on the dashed lines connecting different blocks are solutions from the prior problems, which are used for coordination or problem-solving in sequential problems.
The domain of the variables of (P0) is huge and not continuous, and it requires the simultaneous exploration of three groups of decision variables. One way to simplify the problem is to search for the optimal solutions for the three groups of variables sequentially via the Tammer decomposition method [10]. If we first fix an arbitrary task offloading decision, then (P0) becomes an optimization problem only with free variables u and r. Note that the context transfer delay Δd i is independent of u and r. Then (P0) is reduced to the subproblem Note that if the given task offloading decision φ φ φ is infeasible, then no solution of u and r may satisfy all constraints of (6b) ∼ (6f). In that case, let J * (φ φ φ) = ∞. Since the reduced optimization problem (7) has only u and r as free variables, it is called the resource allocation (RA) problem.
Subsequently, the searching for the optimal task offloading decision is named the task offloading (TO) problem The decomposition of the overall problem (P0) into the TO and the RA problems is illustrated at the top of Fig. 2. By the definition of the RA problem in (7), since the value of φ φ φ is given, we can determine the set of vehicles offloading to each RSU-j as Then the objective function of the RA problem in (7) becomes: The RA problem is equivalent to the subproblems of finding the optimal resource allocation at each RSU. The individual problem at RSU-j is defined in (P1): In (P1), we simplify the problem by removing φ φ φ-related elements. Since only one RSU-j is considered, dimensions of variables are decreased, and u, r ∈ (0, 1] |V j | . Time constraint is replaced by ΔT i in (10d) to compensate context transfer delay with: ΔT i = ΔT − Δd i . Once the decision variable φ φ φ is given, the corrected values ΔT i are known at the vehicles. As illustrated in Fig. 2, the RA problem for the entire transportation system is reduced to the decentralized RA problems at individual RSUs.
Note that (P0) and (P1) suffer a common disadvantage that they are formulated and solved in a centralized manner, which requires the entire observation of all vehicles connected to the same RSU. Such an approach allocates the decision algorithms at the RSUs, where they acquire the detailed task information from each vehicle and then calculate the solution. This introduces an extra communication burden and may cause security and privacy concerns by its nature. Moreover, the optimization problem may become intractable for large-scale systems since an excessive delay is included. Thus, in Section III-B, we formulate a decentralized strategy to overcome this issue via dual decomposition.

A. Convexification of the RA Problem
The objective function and constraints in (P1) are continuous and twice differentiable in their domains. However, by the properties of quasiconvexity (see, e.g., [40] Section 3.4.2), it can be found that E c ij (r ij ) in (10a) is quasilinear but not convex. In order to hold convexity, we can transform the problem by substituting the variable r ij . Note that the transmission rate q ij is always a non-zero value, which ensures the feasibility of the substitution. Let us introduce ω ij , where Then problem (P1) can be rewritten as

Lemma 1: (P2) is a convex optimization problem.
Proof: Please refer to Appendix A.

B. Decomposition of the RA Problem
In (P2), the objective function (12a) can be viewed as the sum of the objectives of all vehicles. Also, each vehicle has its own constraints in (10e), (12b), and (12c). However, constraints (10b) and (10c) are coupled among different vehicles, in which the restrictions on CPU utilization and energy consumption can be viewed as the aggregate amount on RSU-j. These coupled constraints hinder us from solving the optimization problem for individual vehicles. To tackle this issue, we form the dual problem by introducing the Lagrange variables λ λ λ = (λ 1 , λ 2 ) ∈ R 2 + for the inequality constraints (10b) and (10c). For RSU-j, we denote by λ λ λ j the Lagrange variables. This results in the Lagrangian function: Correspondingly, the dual function g(λ λ λ j ): R 2 + → R, as the infimum value of the Lagrangian function over u, ω ω ω for λ λ λ j ∈ R 2 + , can be expressed as The dual function can be evaluated separately on each vehicle with the Lagrangian variables. Moreover, the optimization objective comprises two parts: computation and communication. They have an independent structure of the objective and constraint functions, but are coupled by (12b) in (P2) as the time constraints. To analyze the problem appropriately, we can further break down the problem into single-commodity [41] for each vehicle. Assume that the optimal computation time of vehicle-i's task at the RSU-j is T o * ij , and T o * ij < ΔT i should always hold; otherwise (P2) does not have a feasible solution. Then, the communication delay must satisfy: ij as an auxiliary parameter, we can decompose the problem by separating the computation and the communication part of each vehicle. Accordingly, the dual function is where g o i (λ λ λ j ) and g c i represent the dual functions of computation and communication subproblems at vehicle-i. Note that (17) is independent of the primal coupling constraints (10b) and (10c). By the definition, the corresponding subproblems are and Problems (P3) and (P4) are computation and communication subproblems at the vehicle side. Note that the variation of |V j | in (16) does not influence the optimal solution. Similar to (P2), both subproblems can be verified to be convex, and the duality gap is zero. Hence, the dual optimum obtained by (P3) and (P4) is equivalent to the primal solution of (P2), which enables us to solve the centralized primal problem through the decentralized dual subproblems.
Proposition 1: Given an RSU-j and n vehicles, when the optimal solutions of the dual subproblems in (P3) and (P4) at vehicles converge to the primal global optimum in (P2), the values of Lagrangian variables (i.e., λ j ∈ R 2 + ) are only influenced by the total amount of computation workload (i.e., i∈V j C i ) and energy constraints (i.e., θ j and E o j ), and: Proof: Please refer to Appendix B. Remark 1: Proposition 1 reveals that the associated Lagrangian variable is only influenced by the energy-related parameters and the total amount of task load. Therefore, it enables an RSU to predict Lagrangian variables from the estimated total workload at the RSU.
Even with the closed-form expression of Lagrangian variables, the future workload i∈V j C i is a posteriori knowledge, which is only available after the optimal task offloading decision φ φ φ has been determined. However, as shown in (8), the solution of the TO problem requires the solution of the RA problem. We need to break the circular dependency between the TO and the RA problems. In Section II-B, we assume that the offloading demand of an area can be accurately estimated. Therefore, we apply E(C j ) as the load forecast on RSU-j to replace i∈V j C i when deriving Lagrangian variables and calculating the corresponding objective function. It is given as Note that in this paper, λ j1 is always treated as 0, since the existence of E o j implies a more stringent constraint on the task response time, and thus the schedulability limit is viewed as relaxed. If the value of E o j is unrealistically large, then λ j1 is not zero, and the time constraint becomes dominant. This extreme case is not explored in this study.
A major argument in Appendix B is (35): where 1 |V j | is an 1 × |V j | vector of all ones. The closed-form expression of u * ij is independent of the task load C i . It shows that within the task period [t, t+ΔT ), the RSU works with a fixed CPU frequency for all tasks from the connected vehicles. With (21) and (35), the optimal computation time T o ij (u * ij ) in (22) corresponds to T o * ij introduced in (19b). Consequently, all terms in (P4) can be explicitly expressed. Since (P4) does not include Lagrangian variables, its optimal solution ω * ij can be determined independently by convex optimization. Therefore, we have shown that the optimization problems (P3) and (P4) can be efficiently solved with the given λ λ λ j . As depicted in Fig. 2, they give the fully decentralized RA solution for the convex RA subproblem (P2), and thus the RA subproblem (P1). Accordingly, the computing-intensive optimization at the edge can be circumvented by leveraging the unidirectional communication from RSUs to vehicles with the updated values of Lagrangian variables. Meanwhile, as the resource provider, the RSU coordinates vehicles by predicting and adjusting the values of Lagrangian variables.
In this section, the problem is restricted to the single RSU scenario with the RA solution. In what follows, we extend our solution by considering the multiple RSUs scenario and solve the top-level TO problem in (P0).

IV. LOAD FORECAST FOR TASK OFFLOADING
In (21) and (22), the optimal computation resource allocation is influenced by E(C j ). Therefore, load forecast is essential to vehicles when making the offloading decision. However, even though we assume that the computational workloads of all vehicles in the transportation system can be estimated from a priori knowledge, we still cannot predict the estimated workload on each RSU, because some vehicles may have the flexibility of offloading to multiple RSUs. For instance, if vehicles offload with the pure greedy strategy, then all the vehicles in the service overlapped area will offload towards the RSU with the smallest predicted workload. Consequently, the selected RSU may quickly have a high workload or even become overloaded. All vehicular tasks offloaded to the RSU will suffer longer response times or even miss the deadline. The greedy offloading strategy is neither optimal nor stable.
This problem brings the challenge of reaching global coordination on load forecast in multiple RSUs scenarios. To accurately determine the expected offloading demand, as illustrated in Fig. 3, we consider the following two cases with different RSU deployment architectures, categorized based on whether the coverage of RSUs is overlapped or not [42].

A. Non-Overlapped RSUs Scenario
Consider a VEC system with m RSUs. Recall that each RSU is characterized by a deployment location r j and a coverage range d j . For the non-overlapped architecture, the RSU has its distinct service area, and each drivable position is covered by only one RSU, i.e., U (r j 1 , d j 1 ) ∩ U(r j 2 , d j 2 ) = ∅, ∀j 1 , j 2 ∈ M. Under this condition, the vehicle only has one feasible offloading destination. Therefore, in non-overlapped scenarios, the original problem (P0) degenerates to m independent (P1), and the problem can be trivially decoupled. The RA problem can be estimated efficiently by solving (P3) and (P4), while the TO problem can be omitted. Meanwhile, since there are no overlapped areas, RSU-j can explicitly estimate its load within the covered area by The Lagrangian variables can be estimated with a highaccuracy load estimation system, and the proposed decentralized offloading strategy via decomposition can be achieved. Thus, with the non-overlapped deployment architecture, an optimal offloading solution can be realized.

B. Overlapped RSUs Scenario
In the non-overlapped scenario, the single available offloading destination within an area can be a bottleneck. The offloading availability of the RSU gets exacerbated by the increase in traffic density. Therefore, in order to provide alternative offloading solutions, we assume that the coverage of different RSUs can overlap. When an area is covered by several RSUs, the problem is quite complex in general when performing decentralized coordination. As shown in Fig. 3(b), we restrict the scenario that an area can be covered by at most two RSUs. Despite this simplification, the combinatorial structure of the problem is still challenging to solve in polynomial time [10]. The complexity of the exhaustive search is at most O(2 n ), which is hard to perform by enumerating all possible solutions. Moreover, to tackle this problem, existing heuristic search algorithms (e.g., [9], [10]) require global information from vehicles and execute in an iterative manner, which is contradictory to our decentralized ideas. To analyze the problem, we make the following two assumptions: Assumption 1: An overlapped area has a similar distance to adjacent RSUs. For a vehicle in the overlapped area, the optimal values of (P4) to adjacent RSUs are similar. Thus, the offloading decision of vehicles in the overlapped area is dominated by the result of (P3).
Assumption 2: An overlapped area is located at the boundaries of two RSUs' coverage regions. For vehicles in overlapped areas, context transfer delay only occurs when offloading to the receding RSU.
Assumption 1 is reasonable because OFDMA is used in Section II-D as the communication feature with RSUs. With a fixed task and RA solution, (P4) only depends on the channel power gain, which is influenced by the distance. Note that Assumption 2 is also used in [12] for the position offset consideration, and is helpful in estimating the context transfer delay based on vehicle mobility.
As shown in Fig. 4, we use a directed graph G consisting of m RSU nodes to represent the offloading network considering the vehicle mobility. Let V jk be the set of vehicles driving from RSU-j to RSU-k in the overlapped area of the two RSUs, a jk be the estimated total computation workload of all these vehicles, i.e., a jk = i∈V jk C i . In addition, A = [a jk ] ∈ R m×m is the adjacency matrix of G and a jk is the weight of the edge from RSU-j to RSU-k. Thus, on the bidirectional road, a kj is distinct from a jk owing to driving directions. If RSU-k and RSU-j have a shared region, then a jk , a kj ≥ 0, otherwise a jk = a kj = 0. The self-loop is denoted by a jj and represents the workload in the exclusive areas.
Finding the binary task offloading decisions for vehicles in the overlapped areas has exponential complexity. To avoid the binary optimization problem, we propose a probabilistic task offloading strategy. As shown in Fig. 2, the strategy determines the probabilities of allocating tasks to the adjacent RSUs, which helps with the calculation of E(C j ) to solve (P2). The advantage of this approach is to replace the original binary decision problem by a continuous optimization problem.
The task offloading decision is made by optimizing the TO problem in (8), where the cost function J * (φ φ φ) is evaluated from the optimal values of (P3) and (P4). According to Assumption 1, the cost function J * (φ φ φ) is primarily determined by (P3). Furthermore, the solution to (P3) is the optimal computation time of a task at an RSU, as estimated by (22). The equation implies that, for all vehicles in the same area, the optimal computation time of a task monotonically increases with the total workload at the connected RSU. In addition, Assumption 2 implies that the offloading probability is also related to the driving direction of the vehicle. Therefore, all vehicles in V jk which are on the same edge in G should have an identical offloading probability distribution.
Thus, we denote by P = [P jk ] ∈ [0, 1] m×m the offloading probability matrix, which has the same dimension as A. The element P jk denotes the probability for vehicles in V jk offloading tasks to RSU-k. Correspondingly, the probability for these vehicles to offload tasks to RSU-j is 1−P jk . All elements on the main diagonal of P are 1, due to the unique offloading choice in the exclusive areas.
From the adjacency matrix A and the offloading probability matrix P, the offloading workload on RSU-j can be shown as E(C j ) = A(:, j) P(:, j) + A(j, :)(1 m − P(j, :)) , (24) where (:, j) and (j, :) are the j-th column and row vectors of the matrix. To count context transfer delay Δd i of all vehicles offloading to RSU-j, Assumption 2 yields the total estimated context transfer delay at RSU-j, denoted by ΔD j , through the probability-based workload in overlapped areas: Δd i = A(j, :)(1 m − P(j, :)) . (25) We can further approximate the constraint (6i) on the maximal connected vehicles at an RSU by a new constraint on the maximal expected workload at the RSU with where C i is the mean computational workload of vehicle tasks. Combining the results in (22) and (24), the optimal computation time for all vehicles offloading on RSU-j, denoted by T o * j , can be estimated as Equation (28) corresponds to the objective function of the computation part in (P0), and is determined by the offloading probabilities in the overlapped areas. From Assumption 1, with an optimal RA solution determined by (P3) and (P4), the TO problem in (8) can be transferred to derive the offloading probability distribution that provides the minimal task computation time plus the context transfer delay Note that (P5) is not identical to (8) since the communication part is ignored. However, the focus of (P5) is to derive a probability-based offloading solution, which can be leveraged to estimate offloading workload E(C j ) through (24). And thus, the Lagrangian variables λ λ λ j can be evaluated with (21), which contributes to the decentralized RA approach in Section III.
Problem (P5) can be solved at RSUs in a coordinated way before offloading periods. Its convexity can be confirmed similarly to Lemma 1 by verifying the Hessian being positive semidefinite. Thus, the proof is omitted for the sake of brevity. The value of P can be determined either by convex optimization, or by solving linear equations in the matrix form derived from the first-order optimality condition. Both methods are efficient and give vehicles in the overlapped areas a probability-based TO solution.
Proposition 2: The difference between the optimal solution J(P * ) of (29) using stochastic offloading and the optimal solution J * (φ φ φ) of (8) using the binary offloading matrix is bounded by the following inequality where var(C j ) is the variance of estimated workload on RSU-j with the expression of var(C j ) = k∈{j−1,j+1} [P kj (1 − P kj )a kj + P jk (1 − P jk )a jk ], σ >1.
Proof: Please refer to Appendix C. Remark 2: Equation (30) gives a bound on the optimality error in our probability-based offloading approach. σ denotes the number of standard deviations away from the mean. For instance, when σ = 2, the probability-based offloading approach has more than a 75% chance of being within two standard deviations of the true optimal value J * (φ φ φ). A general trend in the optimality error can be observed. Notice that var(C j ) is affected by P, which is correlated with delay ratio and workload distribution A. With a higher value of and uneven distribution of workload in the exclusive areas (i.e., the variance of main diagonal elements Algorithm 1: TO Coordination at RSU-j. Data:Updated location of the vehicle p i Result:Probability P(j, :), P(:, j) and Lagrangian variable λ λ λ j 1: Collect the updated adjacency matrix A through the offloading workload prediction; 2: Obtain offloading probability matrix P by solving (P5); 3: Update E(C j ) and λ λ λ j through (24)  on A), we can see that P jk approaches either 0 or 1. Such a scenario gives a minor var(C j ), and our task approach tends to reach the optimal point more closely. Especially, when P jk = {0, 1}, ∀j, k ∈ M, the overlapped scenario degenerates to the non-overlapped one described in Section IV-A. The influence of is also examined numerically in Section V-D.

C. Overall Solution Algorithm
Based on the analysis in the above sections, we summarize the overall solution and present the algorithms on vehicles and RSUs, respectively. Algorithms 1 and 2 present the pseudo-code for the decentralized joint task offloading and resource allocation solution in one task period. The RSUs need to finish the coordination task in Algorithm 1 first and broadcast the updated information to the vehicles in their coverage areas, so that the vehicle can determine the task offloading and resource allocation action when the task arises. Note that all control variables are determined in Algorithm 2 at the vehicles, which enables the decentralized control with high efficiency.

V. PERFORMANCE EVALUATION
We simulate a stretch of a 1 km highway with 10 evenly distributed RSUs. Each RSU is equipped with 10 Nvidia Jetson TX2 NX modules 2 as the edge computing server [11]. The computing capacity of each module is 1 Gcycle/s, and the power consumption when fully utilized is 25 W. Based on these  Table II. We set the allocated communication bandwidth of each vehicle as B p = 2 MHz, and the average noise power δ j = 2 × 10 −13 W. According to [15], the nominal channel gain is expressed by the free-space path loss model .11 as the antenna gain, f c = 915 MHz as the carrier frequency, and d e = 2.8 as the path loss exponent. The channel gains used in our model are then generated based on Rayleigh fading channel model as h ij = h ij α, where α is the independent random channel fading factor following an exponential distribution with unit mean.
For realistic evaluation, we adopt the real-time energy management control module proposed in [43], where the power and thermal management problem in a connected hybrid electric vehicle (HEV) system is investigated based on a model-based optimization approach. Based on the type of task in [43], the amount of data to be processed is known before starting the execution, and the amount of data to be transmitted is L i = 1 Mbits. From the numerical results shown in [43], we estimated and selected the average required CPU cycle C i of a task by three levels with {1.5, 1.75, 2.0} Gcycles, where each level represents a different planning horizon length. A random value in the range of [−0.1C i , 0.1C i ] is added to C i to simulate the uncertain convergence times for the optimization task. Without loss of generality, the update of a motion planning iteration is set to 2 s, and it defines the task period and the completion deadline ΔT . Similar simulation settings can be found in [6], [19], [44], showing that the parameters defined in our task model are realistic and have practical relevance. According to [39], the limit on schedulability is approximately U j = 0.7.
To evaluate the efficiency of the proposed Decentralized Offloading approach, we compare its performance against the following baselines.
1) Random Offloading: Each vehicle is randomly assigned an available offloading decision. Then each RSU independently performs optimization on communication and computation resources [45]. 2) Myopic Offloading: Each vehicle is offloaded to the nearest RSU. Then each RSU independently performs resource optimization [9]. 3) Enumeration method: An exhaustive search is performed on all possible offloading decisions to find the global optimum. Only scenarios with limited vehicles are evaluated for this method due to its high computational complexity. For each set of parameters, we randomly generate 50 traffic scenarios. Only overlapped scenarios are evaluated and shown, since the non-overlapped ones can be trivially decoupled. All the optimization algorithms in the case study were executed on a laptop PC with Intel(R) Core(TM) i5-8300H CPU @2.30 GHz, 4 cores, and 16.0 GB installed memory (RAM). After simulations, the average performance is shown.

A. Analysis of Lagrangian Variables
In Proposition 1, we study the Lagrangian variable and give a closed-form expression. To validate the correctness, we first compare values with the converged solution obtained by the primal-dual gradient method, which is often applied in the decomposition-based approach [46]. Note that our problem in (P1) can also be adequately solved with the gradient method. However, such approaches generally require a relatively long time to converge to high accuracy iteratively; hence, we utilize the primal-dual gradient method only for validation.
As shown in Fig. 5, when assigned with different energy constraints (i.e., E o j ) and task loads (i.e., n i=1 C i ), the Lagrangian variable λ j2 varies. The initial value of λ j2 is set to 0.1 with the initial step size SS k 0 = 0.002. To better illustrate the figure, we also halve the initial step size when E o j > 120 J to avoid oscillation and reach faster convergence. To guarantee the convergence, we apply a diminishing step size with SS k = SS k 0 k −0.5+γ , where k is the iteration number and γ = 0.3 is a positive constant. A larger γ gives a larger step size and may lead to stronger oscillation. The stopping criterion is based on -suboptimal [40], and the relative tolerance is set to rel = 1 × 10 −4 . Some of the curves have larger overshoots than others, mainly because the same initial value of λ j2 is applied. By examining the values from (20), it can be confirmed that our derived values coincide with the results from the primal-dual gradient method.

B. Optimization Result and Runtime Comparison
To evaluate the optimality of our proposed method, we compare its performance with other methods mentioned above. Since the Enumeration method searches all possible offloading decisions, and its runtime grows exponentially with the increase of vehicle number, the number of vehicles is set to 30, and the number in overlapped areas is limited up to 15. The response time is shown in Fig. 6. We respectively set C i = 1.5, 1.75 and 2.0 Gcycles, and report the task response time in 50 scenarios for each approach. With the increase of the task workload, the task response time becomes longer. Our proposed Decentralized Offloading performs closely to Enumeration method, which searches over all possible offloading decisions and has the best performance. Compared to Myopic Offloading, Decentralized Offloading achieves slightly shorter task response times in a decentralized way.
The average runtime per scenario finished by each algorithm is reported in Table III. The Enumeration method consumes the longest time, around 1000 times longer than Decentralized Offloading, even for this light traffic scenario. Decentralized Offloading runs faster than Myopic Offloading, mainly because the resource allocation step is performed decentralized in parallel at vehicles instead of at the RSU.   Fig. 7 shows the offloading behavior with different numbers of vehicles. When the number of vehicles is less than 40, the average response times of Decentralized Offloading and Myopic Offloading differ slightly. In general, Myopic Offloading has an advantage over Decentralized Offloading when the number of vehicles is lower than 20, mainly because in light traffic situations, every vehicle can share adequate computing resources. In this regard, communication time has a more significant influence on the performance compared to computation time. Myopic Offloading offloads vehicle tasks to the nearest RSU; thus, the communication delay is minimized. However, when more vehicles offload under the same constrained computation resources, Decentralized Offloading has a better performance with a shorter completion time. Especially when the number exceeds 50, vehicles in overlapped areas can collaboratively select the offloading destination and prompt load balancing among RSUs.

C. Effect of the Number of Vehicles
Since the energy, computation, and communication resources are constrained in the above scenarios, with the increase of vehicles, RSUs cannot serve all tasks and satisfy the timing requirement simultaneously. Service outage [11] happens when a task cannot be completed by the selected RSU, which is

D. Effect of RSU Capacity and User Preference
In this section, we fix the number of vehicles to 40, so that the service outage problem when the task workload is high can be mitigated. We try to examine the offloading performance with varied RSU capacities in terms of CPU frequency f j and energy consumption constraint E o j . Results are checked under different task computation workloads C i . Two types of RSU configuration, homogeneous and heterogeneous servers [10], are evaluated and compared. In the homogeneous scenario, all servers have the same CPU speed of 10 Gcycles/s, while the CPU speeds in the heterogeneous scenario are randomly selected from {5, 10, 15} Gcycles/s. With the change of CPU speed, energy constraint E o j is adjusted accordingly with {125, 250, 375} J to keep the same level of stringency.
Average response times with the increase of C i are shown in Fig. 9. The performance obtained from the homogeneous scenario is better than that in the heterogeneous scenario, mainly because the latter scenario has an unbalanced deployment of computation resources. Meanwhile, there is an increasing gap between Decentralized Offloading and others with the growth of task workload in both scenarios, and it is more apparent in the heterogeneous scenario. This is because when vehicles are close to RSUs with low computation capacities, Myopic Offloading selects the nearest one but Decentralized Offloading may choose a further RSU to balance the workload. Thus the latency performance gets improved.
The effect of user preference η i in (6a) and context transfer delay ratio are also studied, and the results are shown in    Fig. 10(a). With the increase of η i , the energy-saving demand on communication is emphasized; thus, the advantage of Decentralized Offloading over Myopic Offloading is decreased. Fig. 10(b) shows the impact of the context transfer delay ratio on the response time. Both Myopic Offloading and Random Offloading experience a near proportional increase with a higher , while Decentralized Offloading differs with a sub-linear growth. It nearly reaches stabilized after > 0.2, mainly because the solution of the task offloading optimization problem tends to avoid offloading tasks to the receding RSU to reduce the chance of context transfer.

E. Effect of Energy Constraint
In Fig. 11, the influence of the energy constraint on optimization performance is analyzed. We fix the number of vehicles to 50 and C i = 2.0 Gcycles. The values on the horizontal axis decrease along the positive direction, indicating a more constrained scenario. Fig. 11(b) compares the average energy consumption at the RSUs, which is an indicator of load balancing at the edge devices. Since there are 10 RSUs, from (1) and (2), the average energy consumption per RSU is 250 J. In Fig. 11(b), when E o j = 500 J, all tasks can be executed with the maximal frequency (i.e, u ij = 1), and the average energy consumption reaches 250 J for all three policies. When the energy constraint becomes more stringent, owing to the uneven workload distribution among RSUs, tasks cannot be equally solved with the same frequency, and the average utilization gradually decreases. Among the three policies, Decentralized Offloading has the best load balancing performance due to the advantage in the task offloading. Fig. 11(a) evaluates the average task response time when the energy constraint varies. When the energy limit is high, all tasks can be computed with adequate resources, and the average response time is short. In such a scenario, Myopic Offloading performs slightly better than Decentralized Offloading because Myopic Offloading has a lower communication latency. However, when E o j gets smaller, Decentralized Offloading shows better performance in terms of load balancing with a more rapid response time.

VI. CONCLUSION AND FUTURE WORK
This study investigates the optimal computation task offloading and resource allocation problem in vehicular edge computing systems. A decentralized strategy is proposed to minimize the overall response time while guaranteeing the deadline and energy limitations. The original MINLP is converted into tractable convex subproblems through hierarchical decomposition, and it enables the problem to be solved in a decentralized way at the vehicle side. Besides, we study the coordination problem among RSUs with two deployment architectures. A global coordination on load estimation in multiple RSUs scenarios is reached. The probability-based approach provides a near-optimal solution with high efficiency, and the simulation results verify that our approach outperforms baseline methods. In future work, the system's reliability will be investigated with the consideration of disturbances and stochastic behaviors. We will also investigate how to deal with offloading of safety-related tasks. APPENDIX A PROOF OF LEMMA 1 In (P1), the non-convexity manifests due to E c ij (r ij ). After the substitution in Section III-A, r ij is expressed by ω ij . From (11), we have By taking the second-order derivative of E c ij (ω ij ), we have From (10f), we can get the feasible range of ω ij : Since ω ij is positive, we know (32) is always positive. Besides E c ij (r ij ), other r ij related elements, such as T c ij (r ij ) in (10a) and (10d), will be linear to ω ij after the substitution. Therefore, in (P2), the objective function and the inequality constraints are all convex. Thus, (P2) is a convex problem. It allows us to solve the primal via the dual.

APPENDIX B PROOF OF PROPOSITION 1
We focus on the computation subproblem (P3) since the communication part is not correlated with λ λ λ j . The convexity of (P3) can be examined similarly as in Lemma 1 by examining the Hessian as positive semidefinite. Let u * be the optimal solution to the primal problem and λ λ λ * j be the optimal solution to the dual problem. The Karush-Kuhn-Tucker (KKT) conditions for the corresponding problem are Equations (34b) and (34c) state the complementary slackness conditions. In our problem, the energy constraint is lower than the nominal power, which means it is always tighter than the utilization constraint. Thus, the energy consumption will exceed E o j before the utilization reaches the maximal threshold U j . We first assume the strong inequality holds for (10b) by considering it as an inactive constraint. Since it does not bind, we can conclude that λ * j1 = 0. Meanwhile, since (10c) is an active constraint, to satisfy complementary slackness conditions, we have: i∈V j E o ij (u ij ) − E o j = 0. From (34a), we can obtain Since u i ∈ (0, 1], ∀u i ∈ u, we have Also, to make sure our assumption above holds, u * should satisfy the strong inequality in (10b): Therefore, the dual subproblem can be expressed as s.t. (36), (37), (38a) with variable λ j2 ∈ R + . Thus, the optimal dual variable can be In our problem, to meet the schedulability requirement, the total computation task cannot exceed an upper bound of: 3 , which is the maximal capacity of an RSU under the energy constraint. Also, (36) gives a lower bound of λ j2 . The satisfaction of both constraints is examined in Section V.
Besides the self edge a jj , the offloading workload on RSU-j is from four connected edges. With the probability-based offloading policy, it can be seen that, each of them follows the binomial distribution, e.g., a (j−1)j P (j−1)j ∼ B(a (j−1)j , P (j−1)j ). (39) gives the total offloading expected demand on RSU-j, and its summation follows the Poisson binomial distribution, with the mean value of μ(C j ) = E(C j ), and the variance is calculated as [P kj (1 − P kj )a kj + P jk (1 − P jk )a jk ], (40) where operator • denotes Hadamard product.
The ideal computation delay in J * (φ φ φ) is represented by (27) with the expected workload E(C j ), and the actual offloading workload C j is expressed as a Poisson binomial distribution.With Assumption 1, an approximate optimal value difference between J(P * ) and J * (φ φ φ) can be written as From Bienaymé-Chebyshev inequality [47], we can find an estimate of the optimal value error: where σ > 1 is a parameter to represent the number of standard deviations from the mean value.