Slice Sandwich: Jagged Slicing Multi-Tier Dynamic Resources for Diversified V2X Services

With the advancement of intelligent transportation systems, a series of diversified V2X applications come into being, which have different key performance indicators (KPIs) and transmission features. Moreover, multi-tier computing as a new system-level architecture distributes computing and communication capabilities anywhere between the cloud and the end-user. Unfortunately, the existing network paradigm for V2X services adopts a one-shot allocation of resources ignoring the inherent differences of V2X service. To cope with these problems, three types of refined network slices for V2X services are first proposed to simultaneously support heterogeneous service characteristics without excessively splitting resources. Considering the spatiotemporal correlation between service traffic and physical resources, a jagged slicing in multi-tier dynamic resources, which forms a “slice sandwich” brightly, is realized by a dual timescale intelligent resource management scheme. The inter-slice resource configuration is based on neural bandits with upper confidence bounds at each large-time period, while the exclusive resources are managed elastically by deep Q-learning in terms of the real-time changing network state in the small slot. We developed a simulation environment by Simulation of Urban Mobility (SUMO) including real-world road conditions and traffic models. The experiment results demonstrate that the proposed scheme can effectively guarantee KPIs of V2X services and improve the system revenue compared with benchmark algorithms.


I. INTRODUCTION
W ITH the increase in population and the development of urbanization, the transportation system is facing unprecedented pressure [1].Many V2X (vehicle-to-everything) services have emerged to adapt to complex traffic situations and offer enjoyable driving experiences.Up to now, the Third Generation Partnership Project (3GPP) has defined 57 use cases of V2X [2], [3], containing V2V (vehicle to vehicle) services, V2P (vehicle to pedestrian) services, V2I (vehicle to infrastructure) services, and V2N (vehicle to network) services.Different from conventional services for stationary or low-speed equipment, V2X services own exclusive transmission features and key performance indicators (KPIs).To reflect how various V2X services influence the performance of the internet of vehicles (IoV), representative use cases are detailedly summarized in Table I.It is not difficult to see that there are extremely diversified and even conflicting service characteristics among use cases, which poses critical pressure on the networking infrastructure [4].
Network slicing has emerged as a promising paradigm to meet diverse service demands.It enables multiple independent logical networks (i.e., slices) to run on a common physical network infrastructure [5], [9].However, as V2X applications advance, the predefined slice for ultra-reliable low-latency communications (URLLC) can hardly meet more and more stringent and heterogeneous service characteristics by one-shot resource allocation [6], [7], [8].Taking the advancements in existing studies, three types of slices are proposed to accommodate existing and future V2X use cases without excessively segmenting network resources.Specifically, the slices for basic road safety services, enhanced road safety services, and non-safety related services are used to deliver basic driving information, achieve high-level automatic driving, and improve driving comfort and efficiency, respectively.The illustration of representative use cases and their corresponding slices are depicted in Fig. 1.
Unfortunately, constrained by computing capability or transmission delay, it is difficult to process multiple tasks by a single paradigm [11], [12], [13].Multi-tier computing as a new system-level computing architecture provides a new resolution for the problem.It involves three tiers with the users at tier one, edge cloud at tier two, and remote cloud at tier three [14].By reasonably orchestrating available resources along its continuum, the strict KPIs of each slice are expected to be met.However, exploiting this hierarchical computing architecture for service provisioning entails joint allocation of multi-dimensional  1. Illustration of typical V2X applications in vehicular networks.The basic road safety services slice provides position, heading, speed, etc; the enhanced road safety services slice provides raw sensor data, vehicles intention data, coordination, confirmation of future maneuvers, and so on; the non-safety related services slice provides traffic flow optimization and software updates.An exclusive "slice sandwich" for each V2X services slice is made up of jagged multi-dimensional resources.
resources [15], [16].Besides, the high mobility of vehicles introduces more complexity to resource management [17], [18], [19].How to effectively allocate multi-tier resources to multiple slices according to time-varying network conditions is a thorny problem.
To cope with the problem, existing studies usually adopt hierarchical resource allocation methods [20], [21], [22], [23], [24].Although these studies obtained certain results in improving resource utilization, they are not applicable to the IoV.That is because they ignored the exclusive characteristics of V2X services and the importance of multi-tier dynamic resources.Thus, considering the spatiotemporal correlation between service traffic and physical resources [25], a Two-Time-scale Resource Management Scheme (2Ts-RMS) is proposed.Specifically, the scheme is divided into two stages, namely inter-slice resource configuration and intra-slice resource scheduling.At the beginning of each large timescale (i.e., period), the infrastructure provider (InP) configures resources for service providers (SPs) according to service traffic.Due to the long-term trend of the service traffic, the configuration policy remains unchanged within each large timescale.The SPs create customized slices with obtained multi-dimensional resources.Because the inherent characteristics of slices make its demand for multi-dimensional resources appear jagged, the shape of a "slice sandwich" is naturally formed.Then, to adapt the real-time status of the physical layer, each SP will dynamically schedule available resources at each small timescale (i.e., slot) of a large timescale to provide high-quality services for its subscribers.In this way, system revenue could be maximized while guaranteeing the delay and reliability requirements of mobile users.
It is noted that the resource configuration made in the InP will influence the scheduling process in SPs; meanwhile, the performance of SPs will also affect the decision-making of the InP.The interaction between InP and SPs makes it very challenging to implement conventional mathematical methods to solve the proposed problem.Deep Reinforcement Learning (DRL) as intelligent approaches provide promising solutions to the challenge.In the stage of inter-slice resource configuration, since the status of service requests is only up to the users, it does not change by the selected policy of resource configuration.Thereupon, a Joint Allocation algorithm of Multi-dimensional Resources (JAMR) based on the improved NeuralUCB (Neural bandits with Upper Confidence Bounds) approach is proposed.The algorithm can effectively avoid the curse of dimensionality and learns the unknown system revenue.As for the intra-slice resource scheduling problem, state transitions need to be considered, because the scheduling policies of resource allocation and task offloading will generate different effects on the physical layer states.To adapt to the time-varying physical layer, a Joint Offloading and Resource Allocation algorithm based on the Double Deep Q Network (JORA-DDQN) approach is proposed to obtain optimal scheduling policies.The major contributions of this paper are summarized as follows.
r Three types of refined network slices for V2X services are proposed to simultaneously accommodate multiple V2X services over a common infrastructure.r In order to fit reality, real world road conditions and traf- fic models are set up in Simulation of Urban Mobility (SUMO).Numerical experiments using Pytorch verify that the proposed scheme can more economically and efficiently utilize network resources.The rest of this paper is organized as follows.Section II presents an overview of the related works.In Section III, we describe the considered system framework.Section IV presents the two timescale resource allocation problem.In Section V, the solutions based on JAMR and JORA-DDQN are proposed.Section VI evaluates the network performance and compares its performance with some benchmarks.Finally, Section VII concludes the paper.

A. Network Slicing for V2X Services
Up to now, the 3GPP has defined standardized slices to support enhanced mobile broadband (eMBB), URLLC, and massive machine type communication (mMTC) [29], [30].With the evolution of V2X services, more and more rigorous and heterogeneous KPIs need to be satisfied.Mapping V2X services into existing reference slices or a single V2X slice is no longer appropriate [32], [33].Network slicing for concrete application scenarios is still emerging, especially for vehicular scenarios [34].In [35], the authors customized slices for safety and non-safety V2X services, respectively.According to the sensitivity of V2X services to delay, Wu et al. proposed delay-sensitive and delay-tolerant slices [36].As described in Table I, there are great differences between basic road safety services and enhanced road safety services.One slice for safety or delay-sensitive V2X services is still insufficient to simultaneously cope with the differences.
For dealing with this problem, Campolo et al. designed four slices for autonomous driving, tele-operated driving, remote diagnostic, and vehicular infotainment [32].Similarly, the authors proposed a general network slicing architecture for four typical use cases, namely localization and navigation, transportation safety, autonomous driving, and infotainment services [34].A common problem of the aforementioned studies is that the validity of the proposed schemes did not be verified.The complexity of slicing management increases with the number of slices.Dividing V2X services into three slices is a more reasonable solution, which is similar to the slicing way for traditional mobile services.In [31], Ge et al. proposed three types of service slices, which are used to transmit state-report, event-driven, and entertainment-application messages, respectively.In [39], Cui et al. divided the common network infrastructure into three slices to provide short message service, call service, and internet service for vehicles.Different from the existing studies, the proposed slices in this paper fully consider the exclusive characteristics (i.e., transmission features and KPIs) of V2X services.They can cover all V2X use cases defined in [2], [3] without excessively segmenting resources.

B. Resource Allocation for Network Slicing
In addition to slicing services with benign granularity, it is important to effectively allocate resources among slices.In [42], the authors developed a fuzzy logic-based resource allocation algorithm to simultaneously satisfy the diversified requirements of V2X services.Although the scheme achieved higher resource utilization, its computation complexity is high as the InP directly allocates its resources to users.Most of the existing studies tend to adopt hierarchical resource allocation methods to reduce the burden of the InP.In [6], Han et al. proposed a two-dimensiontime-scale resource allocation scheme including inter-slice resource pre-allocation in large time periods and intra-slice resource scheduling in small time slots.The scheme achieves a near-optimal tradeoff among the performance of slices.In [20], Mei et al. designed a slicing strategy with two-layer control granularity.The upper-level and lower-level controllers are used to guarantee the quality of services and improve the spectrum efficiency of each slice, respectively.However, these efforts only concentrate on spectrum resource allocation.The significance of computing resources is ignored, which are necessities to satisfy the KPIs of V2X services.
To address the multi-dimensional resources allocation issue, Mohammed et al. proposed a multi-dimensional resources slicing scheme [49].Both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources.In [23], the authors introduced a generalized Kelly mechanism (GKM) to address the multi-dimensional resource allocation issue between the InP and SPs.Meanwhile, each SP utilizes Karush-Kuhn-Tucker (KKT) conditions to derive the optimal scheduling strategy of communication resources.Although these studies make progress in improving the aggregate revenue of SPs, they cannot be directly applied to the IoV with multiple V2X services.On the one hand, when the InP equally treats all slices, it is hard to guarantee road safety in real-world situations.On the other hand, the differentiated characteristics of multi-tier computing resources have not been extensively explored, which will further cut down the system revenue.In our work, we adopt intelligent approaches to economically allocate multi-tier resources to multiple V2X slices while guaranteeing the delay and reliability requirements of mobile users.

C. DRL-Enabled Network Slicing
In the dynamic IoV, conventional mathematical models face with high computation complexity and lack adaptability and robustness.Advanced DRL algorithms have been widely applied in network slicing [38].From the perspective of the effect of the action on the status, DRL can be divided into DRL based on multi-armed bandits (MAB) and DRL based on Markov Decision Process (MDP) [26], [27].Because the policies of resource allocation and task offloading will generate different effects on the physical layer states, the intra-slice resource scheduling problem is usually formulated as an MDP problem.In [22], Chen et al. leveraged the DDQN algorithm to learn the optimal policies of packet scheduling and computation offloading.In [20], the authors further verified the effectiveness of DDQN in jointly optimizing resource allocation and computation offloading.In this paper, each SP equipped with an exclusive agent implements resource scheduling to guarantee the isolation among slices.
As for the inter-slice resources configuration problem, it is impossible to find the optimal configuration policy before the end of a period.That is because the future status of the IoV is unknowable.In addition, it is impractical to traverse all configuration policies at each period.Taking advantage of the characteristic that the policy of resource configuration will not change the status of service requests, many studies adopt DRL algorithms based on MAB to learn the unknown reward function.In [44], Zanzi et al. developed a radio slicing orchestration scheme based on MAB.With no prior knowledge of channel quality statistics, SPs can make adaptive slicing decisions.In [45], Zhao et al. formulated resource configuration as a contextual MAB problem and adopted the upper-confidence-bound (UCB) algorithm to solve it.However, these studies assumed a linear relationship between the expected reward and the context vector.Furthermore, the effectiveness of DRL algorithms based on MAB will be greatly reduced when the number of candidate actions is large.The curse of dimensionality is inevitable when we jointly consider multi-dimensional resources.Therefore, in this paper, we design a pre-allocation mechanism based on service priories and adopt the NeuralUCB algorithm to obtain an optimal configuration policy of multi-dimensional resources.

III. SYSTEM MODEL
This section describes the system model in detail.Specifically, we first present the network model (Section III-A) and multi-tier resources model (Section III-B) of the IoV.Then, we elaborate on the process of transmission (Section III-III-C) and offloading (Section III-D) for vehicular tasks.Finally, key performance indicators of V2X services will be presented (Section III-E).For convenience, Table II summarizes the major notations of this paper.

A. Network Model
The physical infrastructure of the IoV mainly includes a macro base station (MBS) connected to remote cloud servers, roadside units (RSUs) equipped with MEC servers, and vehicular user equipment (VUEs) with diverse numbers of vehicular computing units.Note that an RSU essentially is a statically logical entity.It supports V2X applications by using the functionality provided by a 3GPP network or user equipment (UE) [46].Thus, we assume all UEs, which consist of VUEs and RSUs, are within the coverage of the MBS and VUEs can only access the internet via RSUs.Let N = {N 0 , N 1 , . .., N m , . .., N M } be the set of UEs covered by RSU N 0 .Furthermore, we assume that VUE N m ∈ N (m = 0) is equipped with Y v m central processing unit (CPU) cores.As for RSU N 0 , there are Y u CPU cores deployed at its MEC server.It means that the MEC server can serve at most Y u VUEs at the same time.f v and f u represent the CPU frequency of each CPU core of the VUE and MEC server, respectively.
As mentioned in Table I, there are great differences among V2X services.Therefore, we propose three kinds of network slices to reflect the differences without excessively segmenting resources.Specifically, the three kinds of network slices embrace the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety related services.The specific characteristics and requirements of each slice are described as follows.
r The slice for basic road safety services is mainly aimed at the services that require high timeliness and reliability but low data rates, such as collision warnings and emergency stops.V2V is the prevalent radio access technology to satisfy the requirements of latency and reliability.Note that the packet size of basic safety services is usually small.Thus, instead of offloading tasks to MEC servers, vehicular computing resources are sufficient to process them.
r The slice for enhanced road safety services aims to en- able high-level autopilot.Compared to basic road safety services, the slice requires higher reliability, data rate, beacon frequency, and lower latency.Similarly, to effectively transmit messages among vehicles, low-latency V2V communication is the main communication mode.Due to the limited processing capability of VUEs and the long transmission latency of remote cloud servers, a proportion of data processing should be performed in MEC servers.
r The slice for non-safety related services has a low sen- sibility to delay and reliability, but usually has high requirements of data rate.As a result, it is expected to use multiple access technologies to seek higher throughput and to process tasks in MEC servers or cloud servers.
In this paper, an SP corresponds to a slice and provides a class of V2X services.Therefore, we will not distinguish the concepts of slice and SP in the following text.To facilitate analysis, let L i be the set of V2V links subscribed to slice i ∈ I with |I| = 3.Then, L = ∪ i∈I L i denotes the set of all V2V links across the whole network.Each V2V link l ∈ L is composed of a transmitter (VTx) N l ∈ N and a receiver (VRx) N l ∈ N .

B. Multi-Tier Resources Model
As described above, each SP simultaneously needs computing resources and communication resources to service its users.The inherent attributes (i.e., KPIs and transmission features) of V2X services make their demand for multi-dimensional resources appear jagged.Thus, the jagged resource slicing on the multi-tier computing architecture is adopted in this paper.Generally, the architecture tends to use three tiers with users at tier one, edge cloud at tier two, and remote cloud services at tier three.Before determining the most suitable communication method and computing location for any service, the hierarchical and distributed characteristics of multi-dimensional resources should be considered.In the tier of terminal devices, vehicular computing resources usually have relatively small computing capabilities.The purpose of local execution is to reduce communication delay and errors caused by transmission and protocols.Significantly, a VUE can concurrently subscribe to multiple slices in our system model, which is consistent with actual cases.Therefore, let Y v m,i be the number of vehicular CPU cores allocated to slice i by VUE N m (m = 0).
As for the edge tier, MEC servers have powerful computing capabilities.However, the computing resources of each MEC server are limited.It means that only a part of VUEs can offload their computing tasks to MEC servers by V2I links.To guarantee isolation among slices, the shared edge computing resources Y u (i.e., CPU cores) and the set of shared wireless communication resources J with |J | = J (i.e., physical resource blocks with bandwidth B) are orthogonally divided into three parts.Let J i with |J i | = J i be the set of the total wireless communication resources allocated to slice i. Y u i is the number of CPU cores of the MEC server allocated to slice i.The cloud tier consists of a large number of remote cloud servers, which has sufficient computing resources.Furthermore, the RSUs are connected to the MBS and cloud computing center via high-speed fronthaul links.Thus, when the VUEs decide to offload computing tasks to remote cloud servers, it is reasonable to ignore the constraint of the number of communication and computing resources.To reflect the usage of cloud computing resources, let Y c i be the number of VUEs offloading computing tasks to remote cloud servers.Fig. 2 depicts a diagram of the jagged allocation of virtualized resources to multiple V2X slices, where each slice

C. Signal Transmission Model
In conventional services, the data rate of large-sized packets can be directly calculated through the Shannon formula.However, unlike conventional services, the packet size of most V2X services is short, which ranges from 32 to 200 bytes [20].Since the negative effect of channel dispersion and coding length, the data rate of a short packet cannot be accurately obtained by the Shannon formula.In [47], based on finite block-length theory, a new method used to approximately calculate the data rate of short packets is proposed.Therefore, the available data rate between VTx N m ∈ N and VRx N m ∈ N on the resource block j ∈ J at slot t can be calculated as formula (1a) or formula (1b) shown at the bottom of this page, where σ 2 is the power of additive white Gaussian noise on each resource block (RB).p N m ,j,t denotes the channel coefficient on RB j at slot t, which contains path loss, Rayleigh fading and shadowing effect.As for the short packet transmission in formula (1b), V N m N m ,j,t is used to reflect the random variability of the channel.It is calculated as ( G −1 (•) and are the inverse of the Gaussian Q-function and the effective decoding error probability, respectively.τ N m is the number of transmit symbols.Both of them are used to reflect the influence of coding on short packet transmission.
In addition, during the phase of data transmission, each V2V link maintains an individual queue to buffer the arriving packets.
for the long packet transmission; Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The packet is delivered according to the first-come-first-serve policy [22].As for link l, the dynamic evolution of its queue can be written as where W l,t is the queue length (i.e., number of packets) at slot t.A l,t denotes the instantaneous packet arrival.W max l is the maximum length of the buffer queue, and Z l is the total packet size (in bits).Δt refers to the duration of each slot.During the duration, the quality of wireless channels keeps stable.As for r v l,t , it depicts the total rate capacity from VTx N l to VRx N l at slot t, which can be expressed as Term ρ l,j,t is a binary variable.ρ l,j,t = 1 denotes the j-th RB is allocated to link l at slot t, otherwise ρ l,j,t = 0.

D. Task Offloading Model
In this paper, we consider a hybrid computation offloading scenario [21].The computing task of a vehicle can be executed locally.It can also select to be offloaded to the MEC server by V2I communication or the remote cloud computing servers through relayed V2I and high-speed fronthaul links.As for the computing task of link l at slot t, let e l,t ∈ {012} be its offloading action.Specifically, e l,t = 0 represents local execution, e l,t = 1 indicates offloading the computing task to the MEC server, and e l,t = 2 means that the offloading position of the task is the remote cloud computing center.Considering the output size of the computing task is much smaller than the input size of the computing task, the download time of processed data is ignored [21], [43].Thus, at slot t, the processing time for the b-th packet of link l ∈ L i can be described as: where β l denotes that the input packet requires β l cycles/bit for processing.Term f c is the CPU frequency of each CPU core of the remote cloud server, and t c is the network delay between RSU N 0 and the cloud computing center.It is worth noting that r u l,t is the available transmission rate for link l to upload data to RSU N 0 , which is denoted as

E. Key Performance Indicators
As defined in [3], whole end-to-end (E2E) communication refers to the process that transfers a given piece of information from a source to a destination at the application level.Generally, the E2E delay consists of waiting time in the queue, transmission time, network latency, and processing latency [48].In this paper, we have assumed that all VUEs are in the coverage of the RSUs and they can only grasp data from RSUs.Consequently, it is reasonable to ignore the network delay during the process of data receiving.Thus, we mainly consider waiting, transmission, and processing delays.At slot t, the E2E delay of the b-th packet of link l can be written as where D cw l,b,t denotes the queuing delay at VTx N l , and D ct l,b,t refers to the transmission time between VTx N l and VRx N l .
To reflect the delay state of link l at slot t, let D l,t be the average packet delay of queue W l,t .In addition to delay, reliability is another key performance indicator [10].From the view of service provisioning, the probability of receiving or dropping data packets is usually used as a measure of reliability [42].When the delay of a packet exceeds the maximum tolerant delay, we consider the packet as dropout, otherwise as receiving.In this paper, we choose the packet reception ratio (PRR) as the index to evaluate reliability, which can be expressed as where is the maximum tolerant E2E delay of link l.

IV. PROBLEM FORMULATION
In this paper, the resource allocation problem is decomposed into two stages.First, at the beginning of each large-time period k, the InP jaggedly allocates shared physical resources to SPs (Section IV-A).Then, at each small-time slot t, each SP independently manages acquired resources to provide services for its users (Section IV-B).Fig. 3 helps illuminate the process of two-time-scale resource allocation.

A. Large Timescale Problem Formulation
At the beginning of each period k ∈ K, the multi-tier resources are jaggedly allocated to slices to maximize system revenue.The revenue consists of the fees charged by SPs and the fees paid for accessing resources.As for slice i ∈ I with resources configuration and c c i,k = {Y c i,k } represent the resource configuration for the terminal tier, edge tier, and cloud tier, respectively.From the perspective of privacy and security, the computing resource of vehicles only can be used by themselves.Thus, it is reasonable to ignore the cost of using vehicular computing resources.Let q cc be the price to access the cloud computing center for each user.q cm and q cp are treated as the price to utilize unit communication resources and unit edge computing resources, respectively.Therefore, corresponding to the set of multi-dimensional resources configuration for all slices Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Based on the above analysis, the problem that the InP allocates multi-tier resources to SPs can be formulated as: subject to : where Y u i,k is the set of edge computing resources allocated to slice i at period k.Similarly, Y v m,i,k is the set of vehicular computing resources allocated to slice i from VUE N m at period k.C1-C3 respectively guarantee that allocated resources do not exceed the capacity of vehicular computing resources Y v m , communication resources J, and edge computing resources Y u .C4-C6 are the constraints to the value of Y v m,i,k , J i,k , and Y u i,k , respectively.These constraints ensure that the number of resources allocated to each SP must be a non-negative integer.C7-C9 ensure isolation among all slices.It is noteworthy that the cloud tier contains a large number of cloud servers and the RSUs connect to the cloud computing center via high-speed fronthaul links.As a result, resource constraints of offloading tasks from the edge tier to the cloud tier can be ignored.

B. Small Timescale Problem Formulation
After determining the resource configuration for all slices, each slice utilizes acquired multi-dimensional resources to provide services for its subscribers to maximize the fee charged by it.As for slice i ∈ I, q se i is the unit price to charge link l for realizing service satisfaction U l,t , which consumes computing resources to process arriving tasks and communication resources to transmit queued packets.Since both delay and reliability are KPIs to weigh the quality of V2X services, the service satisfaction U l,t of link l at slot t can be described as: where α d and α r are weighting factors that balance the importance between delay and reliability.Based on theoretical KPIs and actual indexes, the satisfaction of links shows a negative exponential [38].Besides, to reflect the negative impact of violating KPIs, a penalty factor ψ i is introduced into service satisfaction.In general, penalty factors for safety-related services are larger than non-safety-related services.Thus, for link l ∈ L i , its delay satisfaction and reliability satisfaction are written as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where ϕ min l is minimum PRR of link l.Note that each period is composed of T slots and C k remains unchanged during the duration of period k.Once resource configuration is determined by the InP, the remaining problem for each SP is how to maximize the long-term satisfaction of all service requests.Thus, during period k, the problem that SP i ∈ I schedules resources among its links L i is formulated as: where } is the set of the allocation of RBs.C10 refers that each RB only can be assigned to a link at each slot t.C11 indicates that the allocated communication resources cannot exceed the obtained communication resources J i,k .C12 implies that the computing task of link l can be handled in only one way, such as being executed locally, offloaded to the MEC server, or remote cloud computing servers.C13 indicates that the allocated edge computing resources to all links cannot exceed the acquired edge computing resources Y u i,k .

V. DUAL TIMESCALE INTELLIGENT RESOURCE MANAGEMENT SCHEME
The optimization problems described in Section IV are difficult to solve as they are NP-hard problems.In addition, the IoV needs an intelligent resource management scheme to adapt to dynamic network conditions.Hence, in this section, a novel 2Ts-IRMS is proposed to address the resource allocation problem in the IoV.Specifically, we adopt the proposed JAMR algorithm to address inter-slice resource configuration at each large-time period (Section V-A) while the JORA-DDQN algorithm is used to solve intra-slice resource scheduling at each small-time slot (Section V-B).

A. Inter-Slice Resource Configuration
At large timescales, a central question is how the InP allocates multi-tier resources to SPs to maximize system revenue.Obviously, it is impossible to find the optimal resource configuration of P1 in (10) Let E v m,i,k = 1; 13: end for 14: end for 15: be obtained in advance.In addition, we cannot acquire all V (C k )(∀C k ∈ C) for each period in practice.Luckily, within a specific region, the long-term trend of network conditions can be characterized by service requests [20].Meanwhile, the selection of resource configuration will not change the state of service requests.Consequently, it is reasonable to formulate P1 as a contextual MAB problem.However, the computation complexity is extremely high when the InP simultaneously allocates all network resources.To address this challenge, based on service requests, the problem of multi-tier resource allocation is approximately decomposed into several subproblems.Each subproblem focuses on the characteristics of resources at different tiers.
1) Vehicular Computing Resources: Different from other application scenarios, the IoV contains a large number of safetyrelated services.It is necessary to guarantee their resource requirements to avoid traffic accidents, at first.As described in Section III, when vehicular computing resources are sufficient, local execution is the first choice to process the computing tasks of safety-related services.It can avoid unnecessary transmission delay and error.Thus, based on the consideration of service priorities, we first explore vehicular computing resource allocation among slices.Obviously, safety-related services have a higher service priority than non-safety-related services in practical IoV.As for safety-related services, we consider that basic road safety services have a higher service priority than enhanced road safety services.That is because basic driving functions should be guaranteed at first under the condition of insufficient computing resources.To facilitate analysis, the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety services are denoted as i 1 , i 2 , and i 3 , respectively.Indexes are used to reflect their service priority and vehicular computing resources are orderly allocated based on this priority.The specific flow of inter-slice vehicular computing resources configuration is shown in Fig. 4.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.It is noteworthy that the real-time policies of task offloading and resource scheduling at small timescales have little dependence on vehicular computing resource configuration.Furthermore, the computing resources of each vehicle only can be used by themselves.Therefore, service requests can be considered the only influencing factor for vehicular computing resource configuration.As for VUE N m (m = 0), the number of required vehicular computing resources Y v,req m,i,k for slice i ∈ I at period k can be approximately calculated according to: where x = min{X ∈ Z|X ≥ x} is the ceiling function of x.L i,k is the link set of slice i at period k.Λ (l =m) indicates whether the condition (l = m) is satisfied.Specifically, Λ (l =m) = 1 denotes VRx N l of link l is VUE N m ; otherwise, Λ (l =m) = 0. Since vehicular computing resources are allocated to different slices according to service priorities in turn, the number of vehicular computing resources that each slice can be used to allocate is different.Let Y v,rem m,i,k be the number of available vehicular computing resources of VUE N m to slice i ∈ I, which can be expressed as: where i ∈ I refers to the slice whose priority is one level higher than slice i.After the number of available and required vehicular computing resources of all slices are determined, the number of allocated vehicular computing resources Y v m,i,k from VUE N m to slice i at period k can be obtained.Furthermore, to reflect whether the vehicular computing resources requirements of slice i are satisfied, we define the state set of vehicular computing resources allocation at period k as E v m,i,k = 1 indicates the vehicular computing resources requirement of slice i to VUE N m is satisfied, otherwise E v m,i,k = 0.The detail of inter-slice vehicular computing resources configuration based on service priorities is described in Algorithm 1.

2) Edge Computing Resources & Radio Communication Resources:
In order to ensure that each computing task has a processing location and to avoid wasting resources, the multi-tier computing resources configuration of slice i ∈ I meets: where L i,k = |L i,k | is the total number of pending computing tasks at period k.Y v i,k denotes the number of VUEs with sufficient vehicular computing resources allocated to slice i to process relevant computing tasks.After the vehicular computing resources configuration is determined, the problem of large timescales is transformed into the problem that the InP adjusts the configuration of edge computing resources and radio communication resources among slices.
As described above, we constitute problem P1 in (10) as a contextual MAB problem.Specifically, at the beginning of each period, the InP considered the agent first observes its context in the form of a feature vector.The vector indicates the characteristics of requested services and the allocation status of vehicular computing resources of all slices.At period k ∈ [K], we define the feature vector for arm , and _ A i,k are the average packet payload, average computing tasks workload, and average packet beacon frequency of links in slice i, respectively.L i,k is the total number of pending computing tasks at period k.Y v i,k denotes the number of VUEs that allocate sufficient vehicular computing resources to slice i. J i,k is the number of the total RBs allocated to slice i. Y u i,k is the number of CPU cores of the MEC server allocated to slice i.
Then, the agent chooses to pull an available arm C from the candidate set of multi-tier resources configuration C with the aid of context information.After pulling an arm, the agent will observe reward V k (C k ) from selected arm C k , but the rewards of the other arms are unknown.Over time, the agent aims to collect enough information about the relationship between the context vectors and rewards so that it can predict the next best arm to play by looking at the current context.However, linear contextual bandits make often fail to fit the relationship between the context vectors and rewards in practice.That is because they assume that the expected reward at each period is linear in the feature vector.Thus, in this paper, NeuralUCB is adopted to solve the resource configuration problem.
The key idea of NeuralUCB is to use a neural network f (O k,C k ; ω k−1 ) to predict the reward of context O k,C k and compute upper confidence bounds to guide exploration [28].Specifically, at period k, upper confidence bound P k,C k for each arm C k ∈ C can be computed as formula (18) shown at the bottom of the next page, where . ω k−1 is the parameters of the current neural network.δ and δ are the width and depth of the neural network, respectively.H −1 k is the inverse of matrix H k .It is worth noting that the scaling factor μ k is composed of two parts.One is the confidence radius which is similar to linear UCB.The other one is the function approximation error which is newly added to adapt to the unknown nonlinear function.The exploration parameter ϑ is used to control that the choice is inclined to explore or exploit.The larger ϑ the more inclined the action choice is to explore, otherwise to exploit.The detailed calculation expression of μ k is shown in (19) shown at the bottom of this page.Herein, η and η are step size and the number of gradient descent steps, respectively.χ 1 , χ 2 , and χ 3 are the regularization parameter, confidence parameter, and norm parameter, respectively.χ 4 , χ 5 , and χ 6 are experimental parameters.After upper confidence bounds for all arms are determined, the arm C k with the largest P k,C k is chosen and the agent receives the corresponding reward V k (C k ).Then, NeuralUCB will update H k as At the end of period k, neural network parameter ω k is updated by using gradient descent to approximately minimize L NU (ω).

B. Intra-Slice Resource Scheduling
In a period, once the resource configuration is determined, the remaining problem is how to effectively allocate resources from an SP to UEs to maximize the satisfaction of all links.The optimization problem P2 in ( 14) is difficult to solve since the time-varying nature of the physical layer.Besides, the decisions of task offloading and resource scheduling cause changes in link states (e.g., queue characteristics and channel quality), and service satisfaction also depends on link states.Therefore, we utilize the DRL method based on MDP to solve the proposed intra-slice resource scheduling problem.First, we formulate our problem as an MDP to accurately describe the process of resource allocation and task offloading.
At each slot t during the k-th period, links will send the information of service requests and available resources to their subscribed slice i ∈ I.We define the state space of link l ∈ L i,k at slot t as s l,t = {W l,t , A l,t , Z l , β l , D max l , ϕ min l , r v l,,t , r u l,t , Y v l,i,k }.Herein, W l,t is the queue length, A l,t denotes the number of instantaneously arriving packets, Z l is the total packet size, D max l is the maximum E2E delay, and ϕ min l is minimum PRR.β l denotes that the input packet requires β l cycles/bit for processing.r v l,,t and r u l,t are available rates for V2V and V2I transmission, respectively.Y v l,i,k is the number of vehicular computing resources of VUE N l allocated to slice i.Thus, the state space of slice i at slot t can be defined as: For link l at slot t, its action space a l,t contains task offloading action and RBs allocation policy, which can be expressed as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
a l,t = {e l,t , ρ l,j 1 ,t , ρ l,j 2 ,t , . .., ρ l,j J i ,t }.Term e l,t is the offloading action of link l at slot t, and ρ l,j,t indicates whether the j-th RB is allocated to link l at slot t.Therefore, the action space of slice i can be defined as: The goal of resource allocation at this stage is to maximize the satisfaction level of all links within limited resources.Therefore, we set rewards based on the constraint conditions and objective function.After taking action A t , the reward function is defined as: where In the IoV, each slice is regarded as an agent and owns a private neural network.Each agent aims to find the best policy π to maximize the expected cumulative reward E[R t |s, π] for each state s.The cumulative discounted reward can be expressed as where γ is the discount parameter which reflects the importance of future rewards.The value of γ is restricted between 0 and 1.
A smaller γ represents that mostly care about the instantaneous reward.In value-based reinforcement learning, the state-action function Q π (s, a) named as quality function (Q-function) is commonly used to reflect how good policy π is when taking action a at current state s, denoted as Q-function Q(s, a) provides the optimum policy π * by selecting action a that maximizes the Q-value for the state s: Based on the definitions above, we can seek out the optimal policy π * via the recursive nature of the Bellman equation, Fig. 6.Intra-slice resource scheduling based on JORA-DDQN algorithm.
However, in high-dimensional state spaces, the classic Qlearning method cannot efficiently compute the Q-function for all states.To remedy this problem, DDQN improves the Qlearning by combining the neural networks with Q-learning [41].Specifically, raw data is input into neural networks as the state.Then, the Q-function is approximated by deep neural networks.It is worth noting that DDQN has two separate networks: the main network and the target network.The main network approximates the Q-function, while the target network gives the temporal difference (TD) target for updating the main network.During the training phase, the main network parameters θ are updated after every action while the target network parameters θ − are updated after a certain period.At each iteration, the main Q-network is trained towards the target value by minimizing the loss function.We set a mean-squared error (MSE) loss function.The function can measure how closely the Q(s, a; θ) comes to satisfy the Bellman equation: where Once {θ} is determined, our agent will output near-optimal resource allocation strategies and computation offloading decisions using a discrete set of approximate action values.The detail of JORA-DDQN is described in Algorithm 3 and is depicted in Fig. 6.

A. Simulation Environment
In our simulation, we utilize Pytorch 1.10.0 on Ubuntu 18.04.6LTS to implement the 2Ts-IRMS algorithm and compare it with multiple comparison algorithms.For experimental purposes, a cellular V2X network environment based on the SUMO platform is established, which consists of a real road network, an MBS, and several VUEs and RSUs.Specifically, to fit the reality, we import the road network around the Beijing University of Posts and Telecommunication from OpenStreetMap to SUMO at Algorithm 3: JORA-DDQN for Intra-Slice Resource Scheduling.
1: Initialization: main network weights θ, 2: target network weights θ − , 3: experience replay buffer.4: for episode = 1, 2, . . ., E do 5: Receive the initial observation s; 6: for t = 1, 2, . .., T do 7: Take action a t = arg max a Q π (s t , a; θ) with probability 8: 1 − or a random action with probability ; 9: Get reward r t and observe next state s t+1 ; 10: Store the experience (s t , a t , r t , s t+1 ) into the 11: experience replay buffer; 12: Get a batch U samples (s t , a t , r t , s t+1 ) from the 13: replay memory; 14: Calculate the target Q-value y target t from the target 15: network by ( 28); 16: Update the main network by minimizing the loss 17: L DDQN (θ) in ( 27) and perform a gradient descent 18: step on L DDQN (θ) with respect θ; 19: Every G steps, update the target network θ − = θ 20: end for 21: end for first [40].Then, the whole road network is divided into 9 blocks, which is consistent with the road partitioning strategy of the Manhattan case [50].An RSU is deployed in the center of each block and can communicate with vehicles within its coverage, which is depicted in Fig. 7.In order to reflect traffic in urban regions as much as possible, vehicles randomly choose lanes of departure, positions of departure, and speed of departure to enter the generated road network and follow the car-following model of Krauss and the lane-changing model of LC2013 for movement [37].
For the communication resources in the IoV, we assume that there are 50 RBs with 180 kHz bandwidth to be allocated.For the computing resources, we let the computation capacity (i.e., CPU frequency) of a single CPU core for the VUEs be 10 8 cycles/s, and the number of CPU cores for any VUE be uniformly selected from the set {1, 2, 4, 8}.Similarly, let the computation capacity of a single CPU core for the RSU be 10 9 cycles/s, and the number of CPU cores for the RSU be fixed to 8.For the services required by the UEs, we assume that there are six typical V2X use cases.The transmission characteristics and KPIs of the use cases follow Table I.
In addition, to better evaluate how different V2X services affect the network performance, various combinations of services have been considered in the simulations.By selecting these combinations, we want to test if the proposed 2Ts-RL can satisfy the requirements of multiple services, especially for safety-related services.To simplify, we make slices 1, 2, and 3 represent the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety-related services, respectively.The simulation parameters and neural network parameters are summarized in Tables III and IV, respectively.Afterward, we compare the 2Ts-IRMS algorithm with multiple compared algorithms, which are described as follows: r Hierarchical resource allocation schemes: The two- timescale bidding resource management scheme (2Ts-BRMS) adopts the generalized Kelly mechanism (GKM) to address the inter-slice multi-dimensional resource configuration problem and allocates resources to users according to channel quality (CQ) [23].In the two-timescale fair resource management scheme (2Ts-FRMS), both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources [49].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.r Inter-slice resource configuration schemes: The propor- tional allocation scheme (PA) proportionally allocates resources to slices based on the number of subscribers and average resource requirements [51].As for the context-aware configuration scheme (CA), it adjusts inter-slice resource configuration based on the localized service requests and traditional UCB algorithm [45].
r Intra-slice resource scheduling schemes: As for commu- nication resource allocation, the queue-aware resource allocation strategy (QA) calculates the queue length of each link.The longer the queue length, the more communication resources are allocated.In the fair resource allocation strategy (FA), the communication resources are equally shared by all links.As for computing resource allocation, the local execution scheme (LE), the edge execution scheme (EE), and the cloud execution scheme (CE) make all tasks to be executed on user terminals, MEC servers, and remote cloud servers, respectively [21].

B. Simulation Results
Fig. 8 compares the achieved values of the valuation function of three hierarchical resource allocation schemes under various combinations of services.With the fixed number of VUEs in the cellular network, the proportion of users in slices 1, 2, and 3 iterates through all possible combinations.Compared to other comparison algorithms, 2Ts-IRMS has the highest valuation value while owning more stale performance.That is because it can dynamically adjust resource allocation according to the different number of users and service requests.It is noted that there are two main reasons for the low performance of 2Ts-BRMS.On the one hand, in the phase of inter-slice resource configuration, 2Ts-BRMS equally treats all slices which leads to the failure to meet safety-related services resulting in a greater negative impact.On the other hand, when SPs allocate resources to users, only considering communication resources affects the quality of services, especially for enhanced road safety services.Besides, the value of the valuation function decreases with the increase in the number of users, because the number of resources is limited and cannot meet too many users.
To further analyze the impact of the JAMR approach on interslice resource configuration, we evaluate the performance of multiple inter-slice resource configuration schemes by adjusting the unit price of a certain service, which is depicted in Fig. 9.When the unit price of other slices remains unchanged, it can be observed that the system revenue increases with the unit price of the current service.Significantly, JAMR can maintain the highest valuation at any price setting, which further validates its self-adaptive capability.For the three slices proposed, JAMR provides a gain of 24%, 40%, and 76% with respect to DRF, GKM, and CA on average, respectively.The CA scheme with limited fitting ability has a lower revenue since its performance is seriously affected by the nonlinear problem.Furthermore, the fluctuation of the system revenue in the slice for non-safetyrelated services is obviously smaller than in other slices.The reason for this phenomenon is that the utility of non-safety-related services has a smaller impact on the system performance.Similarly, Fig. 10 depicts the system revenue of multiple inter-slice resource configuration schemes under different punishing values of services.Although the increase in the penalties of services leads to a decrease in the system revenue, JAMR still maintains the highest revenue no matter how the penalties change.
Fig. 11 shows the number of communication resources (i.e., RBs) and edge computing resources (i.e., CPU cores) allocated to different slices under different inter-slice resource configuration schemes.When the number of vehicles remains unchanged at 20 and the proportion of users in slices 1, 2, and 3 is 2:2:1, JAMR assigns 10 RBs to slice 1, 28 RBs to slice 2, and 12 RBs to slice 3.At the same time, 25% of the edge computing resources are allocated to slice 3 and all remaining edge computing resources are allocated to slice 2. Significantly, although the PA scheme allocates enough resources to the slice for enhanced road safety services, the performance of other slices is seriously compromised.
After determining the resource configuration policy for all slices, each SP allocates obtained resources to its subscribers to maximize the long-term satisfaction of all links.To guarantee the isolation among slices, each SP is equipped with an exclusive agent to implement resource scheduling among users based on the proposed JORA-DDQN scheme.To illustrate the convergence performance of the JORA-DDQN scheme in different slices, we plot the variation trend of the SLA violation probability over training episodes for each slice in Fig. 12.In this paper, the SLA mainly refers to delay and reliability.At the beginning of the training, the value of the SLA violation probability is high.With the increase of training episodes, the SLA violation probability gradually decreases.After 1500 episodes, the SLA violation probability is leveling off, which means that all of the slices have converged.Moreover, the slice for enhanced road safety services has the lowest SLA violation probability, which is consistent with KPIs requirements in Table I.
Fig. 13 depicts the performance of links in the slice for basic road safety services during an episode.It consists of the cumulative distribution functions (CDF) of packet delay, CDF of packet reception ratio, and cumulative satisfaction of links.In view of the characteristics of basic road safety services (i.e., small packet size and high timeliness and reliability), it is more important to allocate radio resources than computing resources.That is because most tasks are sufficient to be processed at terminal devices without occupying the computing resources of the MEC server or remote cloud servers.Thus, the DRF, CA, QA, and FA are selected as benchmark schemes to compare with the JORA-DDQN approach.Meanwhile, to ensure fairness, the task offloading policy of benchmark schemes is consistent with JORA-DDQN.Obviously, due to the flexible resource management paradigm, the proposed JORA-DDQN scheme significantly outperforms benchmark schemes whether in terms of delay, reliability, or cumulative satisfaction.Specifically, the average packet delay of link is 93.033 ms and the algorithm can maintain the packet reception ratio of link at least 90%.Besides, JORA-DDQN has the highest service satisfaction and provides a gain of 50% with respect to DRF.As for the slice for enhanced road safety services, the packet size is much larger than basic road safety services, and more CPU cycles are needed to process data.The computing resources of the terminal devices are insufficient to support the simultaneous processing of numerous data.It is necessary to access the MEC server.Thus, the DRF, CA, LE, and EE are selected as benchmark schemes to compare with the JORA-DDQN approach.Similarly, to ensure fairness, the task offloading policy of CA and the RB scheduling policy of LE and EE are consistent with JORA-DDQN.Fig. 14 depicts the performance of links in the slice for enhanced road safety services during an episode.In the proposed scheme, the average packet delay of link is 9.608 ms and the algorithm can maintain the packet reception ratio of link at least 99%.Meanwhile, JORA-DDQN is able to maintain the highest service satisfaction and provides a gain of 52% with respect to DRF.Notably, the LE scheme with the worst performance fails to meet the KPIs requirements of most links.
As described in Section VI-A, the slice for non-safety-related services is expected to offload computing tasks to the MEC server or remote cloud servers.Thus, the DRF, CA, EE, and CE are selected as benchmark schemes to compare with the JORA-DDQN approach.Considering that the slice has a low sensitivity to the reliability, we only draw the curves of delay and satisfaction in Fig. 15.It is observed that JORA-DDQN can effectively use limited resources to reduce task execution delay as much as possible.The CE scheme has poor performance because it will generate additional communication delays.

VII. CONCLUSION
In this paper, we propose three types of network slicing to accommodate diversified V2X services over a common physical infrastructure.Specifically, the slice for basic road safety services is used to fulfill the need for imminent warning to nearby entities in time; the slice for enhanced road safety services aims to achieve a higher level of automatic driving; the slice for non-safety related services focuses on improving driving Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
comfort and efficiency of users.Furthermore, in order to take full advantage of multi-tier resources and consider time-varying network conditions, a novel dual timescale intelligent resource management scheme is proposed.First, at the beginning of each period, the InP jaggedly tunes multi-tier resource configuration among slices to improve system revenue.Then, constrained by limited resources obtained from the InP, each SP carries out real-time task offloading and resource scheduling decisions to maximize the long-term service satisfaction of all users.Finally, based on the effect of the action on the states, we propose JAMR and JORA-DDQN algorithms for learning the optimal strategies of proposed problems, respectively.Simulation results show that our proposed 2Ts-IRMS can effectively guarantee the performance requirements of users and improve the system revenue compared with the benchmark algorithms.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

r
In view of differentiated KPIs among slices, a dual Timescale Intelligent Resource Management Scheme (2Ts-IRMS) is proposed to jaggedly divide multi-tier resources into multiple slices in time-varying IoV.

Fig. 2 .
Fig. 2. Illustration of the jagged allocation of virtualized multi-tier resources to refined network slices of V2X services, compared to flat slicing resources to three generic usage scenarios of 5G.

Fig. 3 .
Fig.3.Schematic of two-time scales resource allocation.First, at the beginning of each large-time period, the InP allocates shared physical resources to SPs (inter-slice resource configuration).Then, each SP elastically assigns exclusive resources to its users at each small slot (intra-slice resource scheduling).

Fig. 5 .
Fig. 5. Edge computing resources and radio communication resources allocation based on NeuralUCB algorithm.

Fig. 7 .
Fig. 7. Real road conditions simulation of Beijing University of Posts and Telecommunications based on SUMO.

Fig. 8 .
Fig. 8.Comparison of the achieved valuation of hierarchical resource allocation schemes under various combinations of services.

Fig. 9 .
Fig. 9. Comparison of system revenue of multiple inter-slice resource configuration schemes under different unit prices of services.(a) Revenue under different unit prices of basic road safety services; (b) Revenue under different unit prices of enhanced road safety services; (c) Revenue under different unit prices of non-safety related services.

Fig. 10 .
Fig. 10.Comparison of system revenue of multiple inter-slice resource configuration schemes under different punishing values of services.(a) Revenue under different punishing values of basic road safety services; (b) Revenue under different punishing values of enhanced road safety services; (c) Revenue under different punishing values of non-safety related services.

Fig. 11 .
Fig. 11.Resource configuration among slices under different inter-slice resource configuration schemes.(a) Allocated proportion of radio blocks; (b) Allocated proportion of edge CPU cores.

Fig. 13 .
Fig. 13.Performance indicators of link in the slice for basic road safety services under different intra-slice resource scheduling schemes.(a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.

Fig. 14 .
Fig. 14.Performance indicators of link in the slice for enhanced road safety services under different intra-slice resource scheduling schemes.(a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.

Fig. 15 .
Fig. 15.Performance indicators of link in the slice for non-safety related services under different intra-slice resource scheduling schemes.(a) CDF of packet delay of link; (b) Cumulative satisfaction of links.
Yu liu is currently working toward the PhD degree with the Beijing University of Posts and Telecommunications, Beijing, China.Her research interests include Internet of vehicles, multi-access edge computing, resource management and orchestration, and deep reinforcement learning, etc. Zirui Zhuang (Member, IEEE) received the BS and PhD degrees from the Beijing University of Posts and Telecommunications, in 2015 and 2020, respectively.He is currently a postdoctoral researcher with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications.In 2019, he visited with the Department of Electrical and Computer Engineering, University of Houston.His research interests include involve network routing and management for nextgeneration network infrastructures, using machine learning and artificial intelligence techniques, including deep learning, reinforcement learning, graph representation, multi-agent system, and Lyapunov based optimization.Qi Qi (Senior Member, IEEE) received the PhD degree from the Beijing University of Posts and Telecommunications, Beijing, China, in 2010.She is currently an associate professor with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications.She has authored or coauthored more than 30 papers in international journal, and is the recipient of two National Natural Science Foundations of China.Her research interests include edge computing, cloud computing, Internet of Things, ubiquitous services, deep learning, and deep reinforcement learning.Jingyu Wang (Senior Member, IEEE) received the PhD degree from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008.He is currently a professor with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications.He is senior member of CIC and selected for Beijing Young Talents Program.He has published more than 100 papers in, such as the ToN, IEEE Journal on Selected Areas in Communications, IEEE Transactions on Mobile Computing, CVPR, ACL, IEEE MultiMedia, ICDE, AAAI and so on.His research interests include broad aspects of intelligent networks, edge/cloud computing, machine learning, AIOps and self-driving network, IoV/IoT, knowledge-defined network, and intent-driven networking.Dezhi Chen is currently working toward the PhD degree with the Beijing University of Posts and Telecommunications, Beijing, China.His research interests include UAV control, game theory, and nextgeneration mobile communication networks using reinforcement learning, and artificial intelligence technologies.Lu Lu received the master's degree from the Beijing University of Posts and Telecommunications, in 2004.She is the deputy director with the Department of Network and IT Technology, China Mobile Research Institute, and the leader with the Core Network Group, CCSA TC5.Her research interests include covers mobile core network, future network architecture, and edge computing etc.) Hongwei Yang project manager with China Mobile Research Institute.His research interests include covers network intelligence and network performance measurement.

TABLE I TRANSMISSION
FEATURES AND KEY PERFORMANCE INDICATORS OF TYPICAL V2X SERVICE USE CASESFig.
before the end of period k.That is because the future state of the IoV is unknowable and cannot