A Markov Decision Process Solution for Energy-Saving Network Selection and Computation Offloading in Vehicular Networks

Vehicular Edge Computing (VEC) enables the integration of edge computing facilities in vehicular networks (VNs), allowing data-intensive and latency-critical applications and services to end-users. Though VEC brings several benefits in terms of reduced task computation time, energy consumption, backhaul link congestion, and data security risks, VEC servers are often resource-constrained. Therefore, the selection of proper edge nodes and the amount of data to be offloaded becomes important for having VEC process benefits. However, with the involvement of high mobility vehicles and dynamically changing vehicular environments, proper VEC node selection and data offloading can be challenging. In this work, we consider a joint network selection and computation offloading problem over a VEC environment for minimizing the overall latency and energy consumption during vehicular task processing, considering both user and infrastructure side energy-saving mechanisms. We have modeled the problem as a sequential decision-making problem and incorporated it in a Markov Decision Process (MDP). Numerous vehicular scenarios are considered based upon the users' positions, the states of the surrounding environment, and the available resources for creating a better environment model for the MDP analysis. We use a value iteration algorithm for finding an optimal policy of the MDPs over an uncertain vehicular environment. Simulation results show that the proposed approaches improve the network performance in terms of latency and consumed energy.


I. INTRODUCTION
W ITH the rapid growth of the automotive industry, fueled by the demands from the end-users and the integration of innovative technologies like the Internet of Things (IoT), modern wireless communication technologies, automated vehicles with advanced communication and computation technologies are now part of the vehicular networks (VNs) [1].These new vehicles are capable of providing new services and applications to vehicular users (VUs) aiming at increasing road safety, avoiding traffic congestion, reducing the pollution level, providing new infotainment services, etc.However, modern applications and services come with stringent requirements in terms of high data processing and critical latency bounds.With limited onboard resources, vehicles alone cannot cope with such requirements and need support from additional platforms, e.g., cloud and edge computing [2].
Though cloud computing facilities have enormous computing resources, since they are located deep inside the core networks, high transmission delays often limit their uses for latency-critical VNs.Edge Computing (EC) technology can address the cloud computing problems by bringing the cloud resources in the proximity of end-users.EC has achieved a great success in the wireless networks when serving users with new innovative services [3].In VNs, EC facilities can be enabled through the deployment of Road Side Units (RSUs) along the road facilitating several EC servers [4].This approach, known as vehicular edge computing (VEC), has the potential to serve VUs with reduced transmission delays and energy requirements.The importance of VEC in the VN scenarios is highlighted by several works in the recent past, mainly for enabling latency-critical applications [5], [6].
VEC technology provides a computation environment to VUs for processing their tasks.VUs can transmit a portion of their computation load to the nearby VEC servers while performing the remaining computation locally.VEC servers perform the processing operations on behalf of the VUs and return the results.This approach is known as partial computation offloading, which allows VUs to complete a task processing operation in collaboration with VEC servers to reduce the overall latency and energy requirements during processing [4].However, when coping with a large number of VUs demanding computation offloading services from VEC servers having limited computation/communication resources, energy limitations, storage capabilities, and coverage range, several new challenges arise into VEC-enabled VNs.These challenges are mainly characterized by a proper selection of when, where, and how much data need to be offloaded to the VEC servers for having adequate performance.This problem is also known as joint network selection and computation offloading, which aims to find a proper VEC server and the amount of data to be offloaded over dynamic vehicular environments [4].
This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In recent times, Machine Learning (ML) algorithms, especially Reinforcement Learning (RL), have gained lots of popularity for solving such highly complex problems over uncertain vehicular environments [5], [7], [8], [9], [10], [11].The decision of selecting a proper RSU node for computation offloading depends upon several factors such as VUs position, speed, nearby VUs, the available number of RSUs for offloading, etc.The exact amount of tasks to be offloaded towards the selected RSU can be impacted by these factors, which often change over time.If a VU is under the coverage of multiple RSUs, the most suitable RSU can be selected by sequentially choosing one after another.Also, VU can make sequential decisions for finding a proper amount of data to be offloaded towards the RSU server.Every decision made by a VU can alter the surrounding environment's state, and can be characterized by some rewards (i.e., an increase or decrease in the task processing time and energy).Therefore, finding the proper edge node (EN) and the corresponding data to be offloaded in the dynamic vehicular environment can be considered as a sequential decision-making problem, and RL-based solution methods can be used to solve it [12].
Among several RL-based techniques, the Markov Decision Processes (MDP) can be effectively used to solve the VEC offloading problem [13].The main components of the MDP models are state space, action space, reward function, and environment dynamics [14].In a particular instance, the MDP agent, being into a current state, performs a particular action and receives the observations in terms of a state change and possible reward based upon the current state and the action performed.Over time, the MDP agent aims to learn an optimal policy function that maps the states over optimal actions for maximizing the reward.A proper environment dynamics, modeling the state transition probabilities from one state to another based upon the taken action, is required.
In recent times, with the integration of modern technologies, such as IoT, and new communication modes, including Vehicle-to-Vehicle (V2V), Vehicle-to-Road Side Units (V2R), Vehicle-to-Infrastructure of Cellular Networks (V2I), Vehicleto-Sensors (V2S), and Vehicle-to-Person (V2P), VUs can learn several environment parameters and also share important information [15].VUs can learn about the nearby competing VUs and their offloading experiences, RSU locations, and availability, etc.This information can be utilized to create better environment dynamics and, in the MDP-based approaches, for solving the VEC offloading problem.With this in mind, in this work, we propose an MDP-based model for the computation offloading problem over the VEC environment.In particular, considering a proper mobility model and different communication technologies, we identify several VUs scenarios.With the help of these scenarios, we define a multidimensional MDP model, where time-dependent transition probabilities equations are proposed.
The scenario under consideration is a VEC-enabled VN with a set of VUs, RSUs, and one macro base station (MBS).VUs generate tasks, and, with limited onboard resources, they request services from the nearby RSUs and MBS.Additionally, VUs are characterized by high mobility with varying speeds.Each VU is covered by multiple RSUs, while each RSU can serve multiple VUs with its limited resources.RSUs are also supposed to operate in different power-saving modes (i.e., standby, active, etc.) for reducing energy consumption during operations.With limited RSU resources and VUs mobility, finding a proper VU-RSU pair can improve the performance of a resource-constrained VN.At the same time, offloading an optimal amount of data to the selected RSU can further increase the performance in terms of energy and latency reduction.

A. Related Work
The problem of network selection and offloading parameters definition aiming at minimizing the latency and energy costs have been addressed in the literature mostly separately.
In [16], authors have considered a computation offloading problem for a vehicular scenario where the aim is to minimize a priority weighted delay performance during the task processing operations.However, the energy performance is neglected.In another case, in [17], authors have proposed a learning-based, energy-efficient task offloading strategy for the vehicular case without optimizing the delay performance.In many such cases, authors have either minimized the latency or energy costs individually and often neglected the cost of the edge-based servers.With the new services and applications in VNs, having critical latency requirements along with larger computations with higher energy costs, minimizing both latency and energy costs together is highly important.In recent times, several researchers have tried to optimize the latency and energy costs in mobile vehicular networks for different settings [10], [11], [18], [19], [20], [21].In [10], authors have performed the joint optimization of latency and energy costs for VEC-based systems by finding the optimal portion of the offloading data.However, it is assumed that the VUs select the nearest ENs for offloading their data.In [11], authors have proposed a dynamic offloading strategy for the vehicular scenario based upon the Imitation Learning techniques.An energy-efficient strategy is proposed for latency-constrained vehicular tasks.However, the authors have only optimized the energy performance while considering the latency as a constraint.Moreover, the authors have performed the binary offloading operations without considering any energysaving mechanism on edge node sides.In [18], authors have studied the energy-latency tradeoff for computation offloading operations in dynamic vehicular environments.However, this work is based on binary offloading operations where vehicular tasks are processed either by VUs or edge servers only.In [19], authors have studied the delay and energy performance of VEC systems for federated learning-based applications by neglecting the edge node side energy costs.In [20], [21], authors have addressed the joint network selection and offloading problem, for the minimization of latency and energy costs.However, the edge node side energy cost is not taken into account.In [22], authors have studied the computation offloading problem in heterogeneous VEC scenarios, where multi-armed bandit theory is applied, and online and off-policy learning algorithms are proposed for the network selection problem.However, the authors have performed a network selection operation by assuming a complete task migration toward the selected node.Such solutions have a limited performance since computation offloading and network selection can impact each other.Offloading the optimal amount of data to the incorrect edge server can reduce the performance of a system.Selecting a proper EN without optimizing the amount to be offloaded can not be considered an optimal solution.Recently, in [23], authors have proposed a multi-task learning-based solution for solving the computation offloading problem over a resource-limited EC scenario.The problem is formed as a mixed-integer nonlinear programming problem, and a multi-task learning-based feed-forward neural network model is proposed for optimizing the offloading and resource allocation decisions.In another case, in [24], a two-tier edge intelligence-empowered autonomous driving framework is proposed for assisting VUs with proper offloading decisions, resource allocations, etc.
In the past, some works have considered joint optimization approaches over vehicular networks for different cases, e.g., joint computation offloading and user association [25], joint computation offloading and task scheduling over VEC [26], joint resource allocation and computation offloading [27], and joint computation offloading and caching [28].In some works, authors have considered a joint network selection and computation offloading problem over a dynamic vehicular scenario [29] limited to the optimization of the delay performance.In [30], authors have developed an efficient partial computation offloading and adaptive task scheduling algorithm for vehicular services.A two-sided matching algorithm is proposed for transmission scheduling, and convex optimization is used for finding a partial offloading ratio.However, most of these works considered traditional heuristic or meta-heuristic approaches and result in limited performance.In addition to this, most of the works are mainly concerned about the VUs energy and completely neglect the RSU energy performance.

B. Motivation
In the current vehicular literature, several works have either optimized the performance in terms of latency or energy costs.In some cases authors have considered the joint minimization of latency and energy costs, however, studies are limited to the network selection or computation offloading processes only.Additionally, in some cases, the energy costs only include the vehicular side energy costs with some assumptions on the edge node energy resources.Also, several of these works have used the traditional energy consumption models while analyzing the energy costs at the edge facilities.This motivates us to form a joint network selection and computation offloading problem with latency and energy cost minimization with an advanced energy consumption model at the EN.
In some cases, authors have considered heuristic or metaheuristic approaches with limited performance over complex vehicular scenarios.In addition to this, in some cases, authors have considered advanced ML-based frameworks such as RL or DRL methods with model-free solution approaches.Such methods can suffer from a higher convergence time, computation complexities, and unstable behaviors mainly due to the high dynamicity of vehicular environments.This motivates us to propose a novel multi-dimensional MDP model based on local vehicular data with time-dependent state transition probabilities.
Therefore, in this work, we have proposed a joint network selection and computation offloading strategy over a mobile vehicular network for overall latency and energy minimization of both vehicular and infrastructure nodes with additional energy saving mechanism at the edge infrastructure.An original MDP-based RL framework with time-dependent state transition probabilities is proposed, where local vehicular environment parameters are used effectively.

C. Contributions
The main contributions of this work are: r Joint network selection and computation offloading prob- lem formulation: We define a joint network selection and computation offloading problem for minimizing the overall latency and energy consumption over VN as a constrained optimization problem, where ENs can be in different energy-saving states, i.e., standby or active, for a more efficient energy-saving behavior.

r MDP model with time-dependent state transition proba- bilities:
The problem is modeled as a sequential decisionmaking problem and incorporated into an MDP-based model.Various elements of the MDP process including state space, action-space, reward function, and environment dynamics with time-dependent state-transition probabilities are considered.r V2X-based on-road scenarios: Exploiting V2X communi- cation technologies and a proper mobility model, various on-road VUs scenarios are defined for solving the burden of the higher dimensional MDP process without hindering its performance.
r Value iteration method for MDP policy: A value iteration- based approach is used for finding the optimal policy for the MDP process.In addition, a set of benchmark methods are considered to analyze the performance of the proposed scheme.The remaining parts of this paper are composed as follows.Section II introduces the system model and defines the optimization problem to be solved.In Section III, we define different vehicular scenarios based on the nearby environment and design the MDP elements for the considered problem.A set of timedependent state transition probability equations are defined.In Section IV, a value iteration algorithm and the corresponding elements are detailed for finding an optimal policy for the considered MDP model.Additionally, two other benchmark methods are proposed for comparison purposes.In Section V, the numerical results obtained through computer simulations are provided and analyzed.Finally, in Section VI, the conclusions are drawn.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In this work, an urban Internet of Vehicles (IoV) scenario for intelligent transportation systems with connected and intelligent VUs is considered, where a set of randomly distributed VUs over In recent times, such urban IoV scenarios have gained a lot of attention from the vehicular research community [18], [19].Table I, includes the important notations considered in the following parts.We refer to V = {V U 1 , . . ., V U m , . . ., V U M } as the set of M VUs, and R = {RSU 1 , . . ., RSU n , . . ., RSU N } as the set of N RSUs in the area.
The system is modeled in a time-discrete manner, and the network parameters are supposed to be constant over each time interval τ , where τ i identifies the ith time interval, i.e., τ i = {∀t|t ∈ [iτ, (i + 1)τ ]} [10].By focusing on the ith time interval, the mth VU is located in the position {x m (τ i ), y m (τ i )}, while it moves at a speed v m (τ i ) along the multi-lane road-path in either direction and equipped with a processing capability equal to c m Floating Point Operations per Second (FLOPS) per CPU cycle, while its CPU frequency is f m .We have assumed resource-limited edge computing nodes equipped with muli-core computing hardware with restricted capacities and limited bandwidth resources [20], [21], [31].Each RSU can be identified through a set of parameters where the nth RSU is located at the fixed position {x R n , y R n } having height h R n , able to provide communication with a maximum bandwidth B R n , and having a multi-core CPU processor with L n cores with c R n FLOPS per CPU cycle, while its CPU frequency is f R n .Similarly, the MBS can be identified through its position {x M , y M }, its height h M , maximum bandwidth B M , supposed to be equipped with a multi-core processor, where each core has a processing capability equal to c M FLOPS per CPU cycle, while its CPU frequency is f M .Here, we do not put any limitation over the MBS CPU cores and assume that each VU can have access to only one CPU core.
The nth RSU has a limited coverage range d n , whose value depends on the communication technology and radio-propagation environment, and it is supposed to provide VEC services to the vehicles within the coverage area.Similarly, for the MBS, the coverage range d M stands.Thus, VUs can offload data up to N + 1 ENs, i.e., N RSUs (i.e., EN 1 , . . ., EN N ) and one MBS (i.e., EN 0 ).Each V U m ∈ V is supposed to be active in each time interval with a probability p a within which it generates a computation task request ρ m (τ i ) identified through the tuple D ρ m , D r ρ m , Ω ρ m , T ρ m corresponding to a task with size D ρ m Byte, expected to give in output a result with size D r ρ m Byte, requesting Ω ρ m CPU execution cycles and a maximum execution latency T ρ m .
In Fig. 1, a possible IoV scenario is depicted, where randomly distributed VUs are able to offload their computation tasks to the nearby ENs.Also, each VU is covered by multiple RSUs along with one MBS.VUs can communicate with ENs over V2R links and with each other through V2V links for information sharing.
VU Mobility and Sojourn Time: Due to the VUs mobility, each offloading operation should be completed by the VU sojourn time, corresponding to the amount of time it remains under the coverage of the selected EN [32], for avoiding additional latency due to, e.g., vehicle handover, service migration, additional signaling for managing vehicles and service mobility.RSU handover process involves transferring the management of active communication from one RSU to another [33].Such handover situations can occur if VU fails to get back the offloaded task results before it passes through the RSU coverage.The handovers can degrade the network-wide performance in terms of latency.
Individual VUs mobility parameters often depend upon the nearby VUs decisions.One of the most often considered mobility models for vehicular scenarios is based on the preceding car dynamics [34].Here, we adopt a similar model for analyzing the VUs mobility.If v v m (τ i ) and a v m (τ i ) represent the speed and acceleration parameters at the ith interval for the mth VU, the model consider that the mth VU mobility parameters depend on the motion and dynamics of the preceding VUs, i.e., where a max is the maximum acceleration value, v max is the desired velocity required for the steady traffic flow, are the relative velocity and inter-vehicular distance between m and m − 1 with l o being the VUs length.δ ∈ {1, 5} is the sensitivity of driver, and s * is the desired space given as: Here, s min is the desired safe space between consecutive VUs, t r is the minimum reaction time headway based upon the safe distance, and b max > 0 is the comfortable braking deceleration.In this work, the safety distance between VUs is considered as a design parameter similar to the [34].However, the safety distance between VUs can be based upon several parameters and tradeoffs i.e., traffic flow characteristics, VUs safety demands, communication capabilities, V2V delays, etc. Interested readers can follow [35], [36] for more information.Therefore, at the ith interval, the mth VU speed and position are: The distance in which the mth VU remains under the coverage of nth EN is D m,n (τ i ) and is given by: where (x EN , y EN ) is the location of nth EN, i.e., either an RSU or the MBS.The available sojourn time for the mth VU can be written as:

VU-EN Assignment, Offloading Process and Resource Allocation: We define a binary VU-EN assignment matrix A(τ
where it is supposed that each VU is able to offload data to only one EN.It should be noted that the first column (n = 0) represents the assignments towards MBS, while the remaining columns, from n equal to 1 to N , are considered for RSUs.The number of VUs requesting services from the nth EN is given by With their limited resources, RSUs can provide services to the VUs before task communication and computation costs become unbearable.
We consider that K max is the maximum number of VUs that can access to the services of each RSU node.However, with rich resource sets, MBS can provide services to several VUs without such limits.
We assume to perform partial offloading, where tasks can be split and processed remotely while the remaining portion is processed locally [20], [21], [32]; the offloaded portion by the mth VU at τ i is identified as α ρ m (τ i ) ∈ {0, 1}.With multiple VUs requesting services from the same EN, during the offloading process, the following constraints need to be taken into account ∀i, n = 1, . . ., N: where is the communication resources assigned to the VU for communicating with the nth EN.Eqs ( 5) model an upper bound on the number of users connected, processing capacity, and the communication resources of the RSUs.The constraint (5a) refers to a system constraint for limiting the complexity of the system model.The edge infrastructure manager can define a strategy for the scenarios where the number of VUs requesting the services from the same RSU node becomes higher than K max .In the considered vehicular scenarios, VUs are forced to perform the local computation of their whole tasks in case the limit is violated. 1It is worth to be noticed that the capacity of each link depends on the specific communication technology and it is out of the scope of this paper.Also, we consider that MBS has abundant resources and is able to serve a large number of VUs without limitations.
With limited EN communication and computation resources, proper scheduling is required when multiple users access.Here, we use the following model for assigning EN resources to the VUs for computation offloading: ) and (7) show the EN resource allocation in terms of computation capacity and bandwidth to the VUs' tasks.According to (6), if the number of VUs requesting services from the nth EN are less than L n , each can have access to the single CPU core with capacity In case the number of users becomes higher than L n , multiple VUs share CPU core resources.Here x is the ceiling function applied over x for rounding it to the nearest integer value higher than or equal to x.According to (7), bandwidth resources will be equally shared among all requesting VUs.
If the mth VU is assigned to the MBS, i.e., n = 0, it can have access to the single CPU core, and equally shares bandwidth resources with the other connected VUs.Thus: In the following, we model the delay and energy requirements of various operations involved during the partial computation offloading enabled vehicular task processing.
Task Computation Model: The generic expression for the time and energy spent for the ρ m th task computation on any device is given by [37]: where c l and f l are the number of FLOPS per CPU cycle and CPU frequency, respectively, whether l identifies a VU (m), an RSU (n) or the MBS ( M ).In ( 9), P c,l is the computation power used by the generic lth device.
Task Communication Model: Since we assume to perform a partial computation offloading, each VU transmits a portion of its task to the assigned EN and receive back the result.Similarly, ENs receive tasks from VUs and send back the results.In general, the transmission time and energy between a generic node k and a generic node l for task ρ k is given by2 : (10) where r kl (τ i ) is the data-rate of the link between the two nodes, while P t k is the transmission power of kth node.Similarly, the reception time and energy to receive the task of size D r ρ k from lth EN by the kth node are: (11) where P r k is the power spent for receiving data.The channel transmission rate between a generic node k and l at the ith interval can be modeled as [22], [38]: where P t k is the transmission power of node k, b ρ k l (τ i ) is the communication bandwidth, σ 2 is the noise power, and I kl (τ i ) is the interference due to any transmitting node, except k, towards node l, where the total interference during the uplink communication (i.e., VU to RSU) can be calculated as For the downlink, instead, we assume to neglect the interference by assuming an orthogonal frequency assignment among RSUs, as well orthogonal RSU to VU transmissions.

EN Operating Modes:
For improving the overall energy efficiency, we assume that ENs can be either in a stand-by or an active state.ENs in a standby state will not be able to serve any VU and effectively will reduce the overall energy consumption.A switching process is assumed for switching ENs from standby to active state with additional switching time and energy.The amount of energy consumed for switching the nth EN from standby to active state is [39]: where, P sw,n is the consumed switching power and T sw,n is the switching time.The amount of time consumed by nth EN for providing offloading services for VU m is given by3 : ) where T ρ m c,n , T ρ m tx,nm (τ i ) and T ρ m rx,mn (τ i ) are the time required for the task computation, transmission and reception between nth EN and mth VU, respectively.
The amount of energy consumed will be based on the operating modes.The nth EN will go into standby mode if no service request from any VU in its coverage area is mapped to it, i.e., a(m, n) = 0, ∀m.The total energy consumption of all ENs operating in the standby mode is given by: where N st (τ i ) = {n | K n (τ i ) = 0, ∀n} gives the total number of ENs operating in the standby mode.Also, E st en,n (τ i ) is the amount of energy consumed by the nth EN, where P sd,n is the power consumed during standby mode that depends upon the computation hardware on the nth EN.Similarly, the amount of energy consumed by the nth EN while serving the mth VU is given by 4 : where P 0,n is the power consumed for the basic circuit operations, and E sw,n is the switching energy required.It should be noted that, as the switching operation occurs only once, if the number of VUs requesting services (i.e., K n (τ i )) from a particular EN increases, the switching energy per VU scales down.E ρ m c,n , E ρ m tx,nm (τ i ) and E ρ m rx,mn (τ i ) are the energy required during task computation, transmission, and reception of data between nth EN and mth VU, respectively.
Task Offloading Process: If mth VU is assigned to nth EN, then the time and energy required to offload the portion of the task with offloading parameter α ρ m to the selected EN and to get back the result in the ith interval is (from (10) and ( 11)), Also, the amount of time and energy consumed on the nth EN for providing services to the mth VU is given by (from ( 13) and ( 15)): Thus, the total time and energy cost required for the offloading process is given by: where (18b) is constituted by two parts (i.e., EN and VUs energy) that can be based upon different energy sources and can have different utility costs.Therefore, for having a properly balanced energy cost over the offloading process, we introduce w 1 as a weighting coefficient in the range between 0 and 1.
Local Computation: From (9), the amount of time and energy required for the local computation of the remaining task in the ith interval is: Partial offloading Computation: From ( 18)-( 19), the delay and the energy consumed during the task processing phases when partial offloading is performed (in the ith interval) can be written as: where the local and offloading processing are supposed to be performed in parallel.Each vehicle should finish the offloading process and receive the result back within the sojourn time, hence: Problem Formulation: The main aim of this work is to optimize the network-wide performance of the VEC-enabled vehicular network.We aim to optimize the performance in terms of overall latency and energy consumed during the offloading process towards edge servers by selecting proper ENs and offloading amounts.The latency and energy requirements of both sides (i.e., VUs and RSU-based edge servers) are considered during the offloading process.The joint latency and energy minimization problem is defined as: s.t.

C1
: C2 : Eqs.(5a), (5b) and (5c) C5 : where A = {α ρ m } M is the computation offloading matrix, A is the VU-EN assignment matrix defined previously, and γ 1 , and γ 2 are weighting coefficients for balancing latency and energy consumption.The objective function in P1 includes the overall latency, VU, and the RSU side energy costs including both active and standby modes costs.C1 stands that each VU can select at most one RSU for the computation offloading.C2 provides the limits over the number of user requests, processing capacity, and bandwidth resource blocks requested by VUs towards ENs, while C3 puts a limit on the maximum processing time as one of the task requirements.According to C4, for avoiding handover phenomena and related latency, each VU should complete the offloading process before it passes through the selected RSUs coverage.In order to have a valid offloading process, according to C5, the weighted energy consumed on VU for processing a complete task should be lower than the total weighted energy required to compute a complete task locally.C6 stands that the two weighting coefficients (γ 1 , γ 2 ) should be between 0 and 1 with a sum equal to 1. Additionally, the energy coefficient w 1 can take a value between 0 and 1.

III. MDP FORMATION
When solving the problem in (22), we aim to minimize the overall latency and energy consumed by finding the combination of proper EN and the amount of data to be offloaded by each VU in the MBS service area.In this work, we consider the MDPbased RL approach to solve the problem at hand.The basic elements of the MDP model include the state-space, action-space, reward function, and environment dynamics.However, modeling environment dynamics (i.e., state transition probabilities of MDP states) over a highly uncertain vehicular environment can be a challenging task.Fig. 2, provides an overview of different elements discussed in the following parts.In the following, we first model several possible vehicular scenarios in which a reference VU can find itself over its course.This scenario set can be used to form a proper MDP model aimed at reducing uncertainty over the environment.For avoiding any possible mistakes during the network selection and computation offloading process, each VU scenario needs to be treated separately.After that, we present the main MDP elements (i.e., state-space, action-space, reward function, and environment dynamics) for the considered problem.After a detailed analysis of the state transition probability matrix, we propose generic time-dependent expressions for finding the state transition probability values in different scenarios based on the VUs state and the action performed.

A. VU Scenarios Defintion
Different VU scenarios are formed, based upon VUs physical locations, number of ENs available for offloading, and the number of nearby competing VUs, aiming at creating a more reliable MDP model with reduced uncertainty.VUs can use V2X communication technologies for acquiring useful information about the number of nearby competing VUs and available EN servers.As shown in Fig. 1, we have used a grid-based approach for limiting the number of possible scenarios that depend upon the actual VUs position.In the considered grid-based approach, a section of the road is divided into G segments of length l g , within which each VU is placed, considering its location parameters.Thus, each VU can have associated a specific section number given by g id m (τ i ) = {1, 2, . . ., G}.Each VU can exploit a different number of ENs for offloading, where ∀n} is the set of available ENs for the mth VU to perform the offloading operation in the interval τ i .Also, we define Vm (τ K n (τ i ) as the number of nearby competing VUs, ranging between 0 to NV max , requesting offloading services from the ENs in the set E m (τ i ).
In the considered multi-user vehicular network, moving VUs can impact each other's network selection and offloading strategies.Each VU should analyze the surrounding environment by finding the competing VUs and their offloading decisions, selected ENs, etc.Since all VUs are supposed simultaneously generate the task requests (i.e., at each ith interval), it is impossible to have such information in advance.In that case, VUs can make offloading decisions by assuming that no other VU is requesting a service leading to a selfish approach.However, this may lead to incorrect node selection and offloading decisions.Another way to tackle this problem is by defining an MDP process that provides a joint solution for all the participating VUs.The presence of a large number of VUs can quickly lead to unbearable complexity and computation requirement.Thus both of these utmost approaches are not suitable for solving the given problem and some sort of assumption is needed for modeling the VUs surrounding environment for avoiding the incorrect offloading strategy/additional complexity.In the following, we consider four strategies supposed by VUs regarding the surrounding environment.
r Minimum distance-based VU-EN assignment: In this case, the mth VU considers that all the Vm (τ i ) VUs are offloading their data to the nearest ENs based upon their physical locations.Thus, ∀V U m ∈ Vm (τ i ): r Maximum sojourn time-based VU-EN assignment: In this case, the mth VU considers that all Vm (τ i ) VUs are offloading their data to the ENs with maximum available sojourn time.Thus, ∀V U m ∈ Vm (τ i ): It should be noted that this approach only considers the assignment towards RSU nodes (since MBS always have high sojourn time).If VUs are not able to find any nearby RSU nodes, they will be assigned to the MBS.
r Probabilistic VU-EN assignments: In this approach The probability of m th VU selecting n th EN is given by: r Position-based VU-EN assignments: In this case, each nearby competing VU is allocated to the ENs based on the available distance before it passes through the ENs coverage range and the distance between VU and EN.Thus ∀V U m ∈ Vm (τ i ): Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Based upon the above discussion for the mth VU, a vector Vm (τ i ) corresponding to the number of nearby VUs assigned to each EN n ∈ E m (τ i ) is formed as: where, VU-EN assignment (i.e., a m ,n (τ i )) is based upon any of the four methods presented above.
In the end, for the mth VU, a scenario vector can be defined as The number of possible scenarios is limited by the parameters G, E max , and NV max .Each scenario needs to be treated separately for finding proper EN and offloading amounts.Every vehicular scenario may have an independent optimal policy that needs to be determined through proper analysis.In the end, N is the set of all possible VU scenarios.
In the next part, we define the State Space, Action Space, Environment Dynamics or State Transition Probabilities, and Reward Function, as basic elements of an MDP approach for the problem at hand.

B. MDP Elements
The MDP is a stochastic process that evolves over time and is characterized by the state space (ST), action space (AS), reward function (R), and environment dynamics (P).The MDP model can be defined as a tuple ST, AS, R, P .
1) State-Space (ST): In a multi-user vehicular environment, the available resources for the computation offloading process change continuously over time and are a function of the offloading and network selection decisions taken by individual vehicles.Therefore, we define a discrete state-space set function of resources available for computation offloading.For each scenario ν, the related state-space is a function of the sojourn time, the required latency, VU resources, and the resources of the available RSUs; thus, each state s ν at time τ i is defined as: We suppose to limit the multi-dimensional state space to N scenarios, hence, ν = 1, . . ., N .Moreover, we assume that the environment states observed by each VU during the joint network selection and computation offloading process can be modeled through proper binary functions.If the mth VU is assigned to the nth EN and performs offloading operation with offloading parameter α ρ m , the environment can be modeled through three proper binary functions, as: 1 else (37) where F 1 ρ m ,n (τ i ), F 2 ρ m ,n (τ i ) and F 3 ρ m ,n (τ i ) are the binary functions depending upon the sojourn time constraint (21), application latency requirement (25) and the energy constraint (27), respectively, and F 3 ρ m ,n (τ i ) includes both active and standby mode energy costs of RSU nodes.Thus, at τ i , the state of mth VU in scenario ν is given by, where, S ν = Z 3 2 is the complete state space for the scenario ν containing all possible binary combinations of

2) Action-Space (AS):
The action space defines all the possible actions available during the learning process.If mth VU belongs to the scenario ν, it can explore the available ENs (E m (τ i )), by properly setting a binary vector EN ν (τ i ) = {0, 1} E m (τ i ) mapping the RSUs selection among the E m (τ i ) available in the given scenario.At the same time, the offloaded amount can be selected from a discrete set of values given by α ρ m (τ i ) ∈ {0, Λ, 2Λ, . . ., 1} where 0 < Λ < 1 is a step change of offloading amount.
The generic action a ν for the νth scenario at time τ i can be defined as a ν (τ i ) = {EN ν (τ i ), α ρ m (τ i )} where EN ν (τ i ) is a binary vector with length ν, where 1 in the nth position corresponds to the selected EN.The complete action space for scenario ν is given by A ν = {a ν (τ i )}.
Once selected, action a ν (τ i ) can change the state of function F 1 (τ i ), F 2 (τ i ) and F 3 (τ i ) with certain probability 5 .Such probabilistic transitions can be defined through: where is the transition probability of F 1 (•) from state ī to state j at τ i through the action a ν (τ i ).Here, δ is the time step of the MDP process.Similarly for F 2 (•) and F 3 (•) the transition probability expressions are given by: In general, F 1 (•), F 2 (•) and F 3 (•) can have different probabilistic transitions for any given action a ν (τ i ).Here, we introduce three transition matrices by considering all the possible transitions of F 1 (•), F 2 (•), and F 3 (•).For F 1 (•), the transition matrix P F 1 (a ν (τ i )) is given by, 5 For the simplicity of notations hereafter we omit, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
with, P F 1 (0,0) (a ν (τ i )) + P F 1 (0,1) (a ν (τ i )) = 1 and Similarly, for F 2 , the transition matrix P F 2 (a ν (τ i )) is given by, (1,1) (a ν (τ i )) = 1.Also, for the case of F 3 (•), the transition matrix P F 3 (a ν (τ i )) is given by, The reward function (R(s ν (τ i ), a ν (τ i )) is defined as the joint objective function of time and energy consumed for complete task processing (22).At the ith interval, if the mth VU is in state s ν (τ i ), and decides to take an action a ν (τ i ) by selecting the nth RSU and α ρ m (τ i ) as an offloading amount, the instant reward received by it is given by, ) 4) State Transition Matrix (P): For the MDP process, the state transition matrix characterizes the environment dynamics through the probabilistic transitions between the present states to the next state.Thus, for scenario ν, the state transition probability at τ i is given by where {F 1 (τ i ), F 2 (τ i ), F 3 (τ i )} is the current state of VU at τ i that takes action a ν (τ i ).
We assume that the state transition probability expression based on F 1 (τ i ), F 2 (τ i ), and F 3 (τ i ) can be considered as independent events, hence (43) can be rewritten as: where each term is based upon ( 39)- (41).For example if . Detailed analysis of this probability values is given below.
In (39) the four probabilistic transitions for the binary-valued function F 1 (•) are set.As shown in (35), F 1 (•) becomes 1, if the computation offloading process fails to follow the sojourn time constraint; on the other hand, it becomes 0, if the process follows the constraint.Two probability values P F 1 (0,1) (a ν (τ i )) and P F 1 (1,0) (a ν (τ i )) model the behavior of F 1 (•) based upon the action taken.These transitions can depend upon several factors, including the number of VUs assigned to the selected ENs, the available sojourn time value, which differs for different ENs, the offloading amount, etc. Modeling the exact nature of these transitions can be hard; we resort to exponential distribution functions for modeling the behavior of F 1 (•).
In case the mth VU in scenario ν selects the nth EN, through the action a ν (τ i ) we define: where (τ i ) is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).According to (45), if the selected action is characterized by α ρ m (τ i ) = 0, the possibility of the failure of offloading constraint becomes zero.The value of λ 1 (τ i ) depends upon several factors.In particular, if the action performed by the mth VU is characterized by high α ρ m (τ i ), if VU selects the EN having a higher number of VUs requesting services (i.e., large V n m (τ i )), or VU-EN pair is characterized by the low sojourn time, the value of λ 1 (τ i ) can increase.As a result, the probability that F 1 (•) changes its state from 0 to 1 (i.e., failure of offloading constraint) becomes high, which can be emphasized in (45).K 11 , K 12 and K 13 are weighting coefficients assigning proper weights to each of these parameters.
P F 1 (1,0) (a ν (τ i )) models the case where, with the selected action a ν (τ i ), the VU is able to satisfy the offloading time constraint where: where (τ i ) is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).This corresponds to say that, if the selected action is characterized by α ρ m (τ i ) = 0, VUs offloading time becomes zero and, as a result, it will satisfy the sojourn time constraint.Also from the expression of λ 2 (τ i ) and (46), it can be seen that, when increasing α ρ m (τ i ) and V n m (τ i ), the probability that the mth VU respects the sojourn time constraint is reduced.The reduced value of T soj m,n (τ i ) between the VU-EN pair can also reduce the chances that VU respects the sojourn time constraint.Here, K 21 , K 22 and K 23 are weighting coefficients.
The second function F 2 (•) models the VUs behavior with respect to the task latency constraint, where each VU needs to perform the task processing within the task latency requirements.In this case, P F 2 (0,1) (a ν (τ i )) defines the probability that VU fails to satisfy the task latency constraint for a selected action a ν (τ i ) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and is given by: is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).This corresponds to say that, with its limited resources, if a VU performs the task processing by itself without offloading any data towards ENs, it always fails to satisfy the task latency requirements.In addition, if we have a strict task latency requirement (T ρ m ), and the selected EN has already a large number of VUs (V n m (τ i )) requesting services, this results in increasing the failure probability of the task latency constraint.
If any VU offloads a very small percentage of data towards an EN, the local computation time required for processing the remaining task can be high.On the other hand, if any VU offloads a larger amount of data toward an EN, it is possible to have a higher offloading time mainly because of unreliable channel conditions, limited EN resources, and other competing VUs.The behavior of P F 2 (0,1) (a ν (τ i )) concerning the offloading parameter α ρ m (τ i ) is modeled as a square function for accommodating these facts.In the end, K 31 and K 33 define the weights assigned to latency and competing vehicles parameters, while K 32 avoid having infinite in the second term.
P F 2 (1,0) (a ν (τ i )) models the VUs' chances of satisfying the task latency requirements and is defined as, where is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).In case VU does not offload any data, it is not able to satisfy the task latency requirements.On the other hand, the behavior of P F 2 (1,0) (a ν (τ i )) will be based upon the offloading parameter, number of competing VUs, and the task latency requirements.K 41 and K 43 are weighting coefficients, while K 42 avoid to have infinite in the second term.
The third function, F 3 (•), models the VU behavior in terms of energy constraint.If the overall offloading process energy becomes higher than the energy required to compute the complete task locally, the offloading process becomes inefficient.
gives the probability that the VU fails to satisfy the energy constraint for a selected action a ν (τ i ) and is defined as, where, λ 5 ) is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).If the mth VU offloads a large amount of data towards the EN with more V n m (τ i ), with selected action a ν (τ i ), there is a high chance that the offloading process energy becomes higher than the local computation energy.However, if VU does not offload any data towards EN, it always follows the energy constraint.Here, K 51 and K 52 are weighting coefficients.
P F 3 (1,0) (a ν (τ i )) models the chances that VU is satisfying the energy constraint based upon the selected action a ν (τ i ): where ) is a parameter modeling the slope of the exponential function and is determined from the action a ν (τ i ).The chances that VU satisfies the energy constraint reduce with the increasing of α ρ m (τ i ) and V n m (τ i ).K 61 and K 62 are weighting coefficients.
By using ( 45)-( 50), the transition probability matrices for F 1 (•), F 2 (•), and F 3 (•) can be determined.In the following Section, we define a value iteration algorithm for solving the MDP.

IV. MDP-BASED JOINT NETWORK SELECTION AND COMPUTATION OFFLOADING
In the previous section, the elements of the MDP model are presented.By solving the proposed MDP model, VUs can find a proper EN and the offloading amount able to minimize the overall latency and the energy consumed during the task processing operations.The solutions set can be defined as a policy function π ν = {π ν (s ν (τ i + δ)), ∀δ} that maps every state s ν ∈ ST to action a ν ∈ AS.Selecting different actions can result in different policy functions, where the aim is to find an optimal policy that corresponds to the minimum delay and energy cost during vehicular task processing.For every policy π ν , a value function V π ν (s ν (τ i )), corresponding to a state s ν (τ i ) can be defined for analyzing its performance.In general, V π ν (s ν (τ i )) corresponds to an expected value of a discounted sum of total reward received by following the policy π ν from state s ν (τ i ), and can be defined as: where γ ∈ [0, 1] is the discount factor, R(s ν (τ i + δ), π ν (s ν (τ i + δ))) is the immediate reward received for following the policy π ν at time τ i + δ from the state s ν (τ i + δ), Δ is the maximum number of steps considered during the MDP evaluation, i.e., episode length, and E{•} corresponds to the expected value.Thus, the value function analyzes the particular policy function by assigning a numeric value to each state, and can be utilized to compare the performance of the different policies.In the end, the following optimization problem can be formulated in order to be able to find the best possible policy function associated with state s ν (τ i ): where, Π ν corresponds to the set of policy functions that can be explored.
As shown by many works (e.g., [13], [40]), the problem defined in (51), can converge into a Bellman optimality equation Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
given by: Different approaches can be used to solve the problem in (52); however, the value iteration approach is widely known for its fast convergence and easy implementation [41].Therefore, below we present a value iteration approach aimed at solving the MDP designed in the previous section for finding an optimal policy that corresponds to the minimization of a task processing time and energy during offloading process over VNs.
The value iteration method allows finding an optimal policy and value function for the MDP models.The Algorithm 1 describes the steps involved during the value iteration process.For every scenario ν, the process begins by initializing the values of each state to ∞ and iteration count (it) to 0 (Line 2).For each state-action pair, the state value is determined by using (53) (Line 5).In the end state value and a corresponding optimal policy (π * ν (s ν (τ i ))) associated with state s ν is determined by using (54) and (55) (Lines 7-8).The iterative process continues till the change in all states values becomes less than the predefined convergence parameter (Lines 10-14).In the end, the algorithm returns the set of optimal policy functions {π * ν } associated with all possible scenarios in which VUs can find themselves over the road (Line 16).
The time complexity of the traditional value iteration process can be estimated to be equal to O(Δ|ST| • |AS|) with Δ being the maximum number of time steps considered, |ST| state space dimension, and |AS| representing the action space.With the involvement of N scenarios, the time complexity expression becomes O( N • Δ|ST| • |AS|).The considered scenario-based modeling can reduce the state and action space dimensions significantly by limiting the number of VUs per scenario compared to the one-shot approaches where all VUs are considered altogether.Especially for the case of VNs, such an approach can be beneficial given the importance of VUs' local environments in the decision-making process (i.e., nearby VUs can influence the VUs' decision-making compared with the other VUs that are located far away from it).Additionally, time-dependent state transition probabilities can reduce the overall uncertainty in the MDP process.It should be noticed that N , i.e., the considered number of VUs scenarios, can impact the performance of the MDP process.On one side, a smaller N , corresponding to a limited set of parameters, can impact the MDP model performance due to additional uncertainties.On the other side, with a bigger N , the computational complexity can be higher with improved performance.

A. Benchmark Approaches
For comparing the proposed MDP model performance, the following benchmark methods are considered: end for 7: This approach can reduce the number of handover requirements however, the computation/communication delay and energy performance might not be optimal.

r MDP-based network selection with static offloading policy (MDP-NS):
To show the impact of a joint network selection and offloading optimization, here we consider an MDP-based network selection decision optimization with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II SIMULATION PARAMETERS
static offloading process.In particular, the MDP process (i.e., action space) is adapted to the network selection only while considering a static offloading policy with α ρ m (τ i ) = 0.5, ∀i, m.
r MDP-based offloading with static network selection pol- icy (MDP-Off): In this case, the offloading decisions are optimized, while a static network selection is considered.In particular, VUs are considered to select the nearest EN while offloading with an optimal policy generated through the MDP process of Algorithm 1.In the following, MDP-MD, MDP-PA, MDP-SA, and MDP-PsA stand for the MDP with minimum distance assignments, probabilistic assignments, sojourn time based assignments and the position-based assignments of nearby VUs, respectively.

V. NUMERICAL RESULTS
The proposed MDP model and corresponding value iteration algorithm is evaluated over a Python-based simulator, using ML-related libraries such as NumPy, Pandas, and Matplotlib.The main simulation parameters are listed in Table II.In this work, we have considered that 80 RSUs with h R n = 3 m are located alongside the road network in the MBS coverage area.The number of VUs is between 200 to 1800 with p a = 0.2.Each VU travels with a variable speed based upon the intelligent mobility model, with parameters v max = 15 m/s, s min = 2 m, a max = 0.7 m/s 2 , b max = 1.5 m/s 2 , t r = 2 s.The background noise power σ = −110 dBm is considered.
Also, each RSU can serve up to K max = 12 VUs.Additionally, the communication channel parameters are β 0 = −25 dB, and θ = 2.5.The RSU switching parameters include switching time T sw,n = 25 ms, and switching power P sw,n = 0.2 W. Also, when the nth EN is operating in the standby mode, the standby power is P sd,n = 0.42 W. The power consumed for the basic circuit operations is P 0,n = 0.5 W. The VUs scenarios are based upon l g = 3.3 m, E max = 4, and NV max = 36.Other MDP parameters include the set of weighing coefficients given by, [K 11 , K 12 , K 13 , K 21 , K 22 , K 23 ] = [0.5, 0.07, 0.4, 0.5, 0.07, 0.4], [K 31 , K 32 , K 33 , K 41 , K 42 , K 43 ] = [0.08,0.6, 0.5, Avg.Latency and Energy Cost with Varying VUs: In Fig. 3, we present the average cost value in terms of the total latency and energy requirements of VUs task processing.By varying the number of VUs, we obtain the performance of different MDP schemes defined before and analyze their performance by comparing the results with the benchmark methods.It can be seen that proposed MDP schemes perform better compared with the benchmark approaches.By analyzing the surrounding environment, different MDP schemes are able to find proper EN and the amount to be offloaded.In particular, with a high number of VUs, the MDP-PsA approach having a better knowledge of the surrounding environments in terms of various distance measures (i.e., the distance between VU and ENs and the distance before it passes through the EN coverage range), performs better than the other schemes.The superiority of the MDP-PsA approach can be visualized through the zoomed version of the plot.The two benchmark MDP methods (MDP-Off and MDP-NS) have worse performance compared to the joint optimization-based approaches, mainly due to the static policies.This highlights the importance of simultaneously selecting the proper ENs and offloading the proper amount.
Number of RSU handover required during computation offloading: If VU fails to perform the offloading operation (which includes the transmission of VUs data towards selected EN, EN processing, and receiving back the results from EN), before going out from the coverage of the selected EN, an additional handover process/cost is required.In Fig. 4, we present such handover requirements posed by a different set of VUs in terms of the average number of VUs which fail to complete the offloading operation within time limits.It can be verified from this figure that the proposed MDP schemes (in particular MDP-PsA) are performing better compared to the other benchmark methods in terms of a reduction in the overall handover requirements.Thus by avoiding the number of handover requirements, MDP schemes can reduce the service provisioning costs over vehicular environments.The benchmark MDP methods, in  particular MDP-Off, suffer from higher handover requests due to the imperfect offloading decisions compared to the other MDP methods.
Number of service time constraint failures: In general, VUs application latency requirements need to be respected during task processing operations, failure of which can reduce the overall QoS.In Fig. 5, we provide the percentage of VUs that fails to satisfy the application latency requirement constraint in (25).The proposed MDP approaches are able to reduce such failures effectively and can be vital for enabling latency-critical services over VN.Similar to the previous cases, the MDP-PsA approach outperforms the other MDP schemes and can be seen through the zoomed version of the plot.The MDP-NS and MDP-Off methods induce higher latency costs, and their performance suffers with more service latency failures than the other MDP approaches.
Task Completion Latency: To have a better understanding of overall latency requirements, in Fig. 6, we present the performance of different schemes in terms of average latency requirements during the task processing operations.This figure shows the overall reduction of latency cost for VUs task processing operations.Through a proper understanding of the nearby environment parameters (e.g., competing VUs, available RSUs,  mobility characteristics), the MDP schemes, in particular MDP-PsA approach, can determine the proper EN and the offloading amount for having better performance.Optimizing only the network selection or offloading decisions through the MDP-NS and MDP-Off methods cannot guarantee optimal performances and suffers from higher latency requirements.
Average Energy Consumption: In the following Figs.7 and  8, we present the performance of different schemes in terms of average energy requirements.Fig. 7 presents the average amount of energy cost over VUs, which includes the local computation, data transmission, and reception costs.The benchmark approaches do not perform any local computation, due to which they have slightly better performance in terms of VUs energy consumption.However, as shown in Fig. 8, both benchmark methods add large energy costs over ENs.On the other hand, with proper EN selection, and proper offloading decisions, all MDP schemes are having better energy performances over EN.Also, as shown in Fig. 3, the overall performance of the MDP process in terms of joint latency and energy cost is better compared with the benchmark approaches.The joint energy performance of the MDP-Off and MDP-NS methods suffers from imperfect decision makings and impacts the overall costs shown in Fig. 3.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Average Number of Active ENs: In Fig. 9, we have presented the average number of active ENs for varying numbers of VUs.In the beginning with a limited number of VU density, only a limited number of ENs are active.With most of the ENs being inactive, the overall energy cost can be reduced compared with the traditional approaches with all ENs being active.As VU density increases, the active ENs increase for satisfying all VUs service requests.This allows reducing the total number of service failures.With this and previous results, it can be validated that the proposed methods are able to adapt the ENs energy resources according to the VUs demands limiting the EN energy costs along with the potential service failures.

VI. CONCLUSION
In this work, we considered the joint optimization of network selection and task offloading through a proper minimization of delay and energy for a VEC offloading system.For solving such a complex problem over a highly uncertain vehicular environment, we have proposed a MDP approach by analyzing different vehicular scenarios.The proposed MDP model considers the changing vehicular environment while making the decisions of EN selection and offloading portion.A value iteration-based method is used for solving the proposed MDP model by finding the optimal policy to be followed by each VU in the different scenarios.The simulation results show the superiority of the proposed scheme over various benchmark methods.One of the most prominent contributions is that of having considered the joint network selection and computation offloading problem while jointly minimizing latency and energy costs with additional energy-saving mechanisms at the edge infrastructure.Such studies were not present in the current literature and thus can motivate future readers to investigate it further.However, with additional granularities and joint decision-making processes, the problem becomes extremely complex to be solved through the traditional approaches.For this, we have proposed a novel MDP model with time-dependent state transition probabilities reducing the overall instability.However, since the MDP approach could become very complex in case of a large parameter set, in this work, we have exploited the local vehicular communication modes in the MDP process for improving the overall performance.This can motivate future readers to investigate the proposed solution methods in these directions.