Multi-Source Multi-Destination Hybrid Infrastructure-Aided Traffic Aware Routing in V2V/I Networks

The concept of the “connected car” offers the potential for safer, more enjoyable and more efficient driving and eventually autonomous driving. However, in urban Vehicular Networks (VNs), the high mobility of vehicles along roads poses major challenges to the routing protocols needed for a reliable and flexible vehicular communications system. Thus, urban VNs rely on static Road-Side-Units (RSUs) to forward data and to extend coverage across the network. In this paper, we first propose a new Q-learning-based routing algorithm, namely Infrastructure-aided Traffic-Aware Routing (I-TAR), which leverages the static wired RSU infrastructure for packet forwarding. Then, we focus on the multi-source, multi-destination problem and the effect this imposes on node availability, as nodes also participate in other communications paths. This motivates our new hybrid approach, namely Hybrid Infrastructure-aided Traffic Aware Routing (HI-TAR) that aims to select the best Vehicle-to-Vehicle/Infrastructure (V2V/I) route. Our findings demonstrate that I-TAR can achieve up to 19% higher average packet-delivery-ratio (APDR) compared to the state-of-the-art. Under a more realistic scenario, where node availability is considered, a decline of up to 51% in APDR performance is observed, whereas the proposed HI-TAR in turn can increase the APDR performance by up to 50% compared to both I-TAR and the state-of-the-art. Finally, when multiple source-destination vehicle pairs are considered, all the schemes that model and consider node availability, i.e. limited-availability, achieve from 72.2% to 82.3% lower APDR, when compared to those that do not, i.e. assuming full-availability. However, HI-TAR still provides 34.6% better APDR performance than I-TAR, and ~40% more than the state-of-the-art.


I. INTRODUCTION
Smart cities can benefit from VNs as they could provide safer, more efficient and more enjoyable driving [1]. Unfortunately, existing communications systems are not able to achieve the adequate coverage, long-lasting stability and large throughput required by such smart city applications due to their highly dynamic and complex environments [2]. Thus, future intelligent VNs can rely on multi-hop communications to achieve increased data transmission and coverage [3]. As a consequence, efficient routing algorithm design has attracted a lot of research attention due to its critical role in achieving successful communications. However, due to their uncontrolled and unpredictable nature, routing in mobile networks has many challenges, which become even more severe in the case of VNs [4]. For example, the frequent network topology changes as well as the non-uniformity of the vehicle spread on the available road infrastructure can often lead to frequent network partitions and implicitly to low communications performance. Such nature of VNs prevents network designers to realize applications with a wide range of quality of service (QoS) requirements, such as high throughput and low delay [5].
A variety of conventional routing techniques have been proposed. Traditional geographical routing techniques, for example, provide design simplicity and scalability [6], [7], [8]. However, these techniques do not work in VNs, as they do not consider real-time road traffic information and thus, are unable to achieve the required QoS requirements of urban VNs [9]. To remedy such limitations, techniques such as cross-layer traffic-aware routing and infrastructure-aided routing have been introduced [10]. Traffic-aware routing protocols are believed to be the most promising forwarding strategy in the urban VNs. Obtaining knowledge of the neighboring vehicles in the network has also been shown to be beneficial for routing performance in both sparse and dense environments [11].
Intersection-based routing was shown to be another performance enhancer for routing in [9]. The main reason behind this is that network congestion and interference issues are encountered more often at intersections, mostly due to their relatively increased vehicle density [12]. Moreover, nodes cannot easily communicate with other nodes placed on different road segments due to NLOS conditions. Additionally, road-side units (RSUs) are considered and placed around intersections in urban implementations of VNs, such as [10], [13], [14] and can assist when the 'coverage hole' issue is encountered and provide VNs with the desired connectivity.
Routing design was shown to further benefit from the use of reinforcement learning (RL) techniques, which can combat the challenges imposed by the highly dynamic environment of VNs [9]. By employing a RL-aided traffic aware routing protocol, the traffic information is dynamically learnt throughout the network. For example, the Q-learning algorithm allows network participants to map their set of possible actions to a set of environment states. More explicitly, the nodes explore their surroundings by frequently transmitting packets into the network and thus attain and update the latest state of the environment, which gradually and intelligently enables the learning of the dynamic environment [10], [15], [16], [17].
However, considering the large number of nodes in urban VNs, the queuing phenomena can lead to increased processing overhead as multiple attempts to use a node as a next hop are made, and thus, become harmful to the routing approach's efficiency. This is even more evident when RSUs are considered within the VNs since they can easily transform into bottlenecks as they become unavailable due to high demand from other nodes. Against the above discussion, the contributions in this paper can be highlighted as follows.

A. CONTRIBUTIONS
As the state-of-the-art approaches to routing in VN still presents the aforementioned challenges, we propose a newly designed hybrid routing algorithm for urban VNs that relies on the static wired infrastructure to relay packets, while also considering vehicles as next-hop relays whenever the RSUs are not available. In Fig. 1 we present a timeline-based comparison between the most popular vehicular routing schemes encountered based on their innovative idea, for a better understanding of how routing design has evolved in VNs. In addition, we highlight the main novelty points tackled in this paper in Table 1. Against this background, our contributions can be summarized as follows: • We propose a new infrastructure based traffic aware routing algorithm (I-TAR) for VNs, where the static infrastructure used for relaying messages through the network is wire-connected. We show that a 19% higher APDR can be achieved compared to our benchmark, QTAR, while maintaining the same quality-of-service (QoS) requirements.
• We model the availability of the nodes, based on their participation within multiple communications paths at the same time in a multi-source, multi-destination scenario for a more realistic network modelling. Then, we show that QTAR as well as our proposed scheme, I-TAR, provide more than 50% less APDR, when the more-realistic limited-availability (LA) scenario, which considers node availability, is employed. • We propose hybrid infrastructure based traffic aware routing algorithm (HI-TAR), as a new hybrid routing technique that chooses the best V2V/I path to the destination accordingly, which can improve the APDR by up to 50% when compared to I-TAR as well as QTAR.
• We provide performance analyses of the proposed hybrid routing scheme, HI-TAR, and compare it with I-TAR and QTAR, which shows that, for the limitedavailability scenario, HI-TAR still provides 34.6% better APDR performance than its previous version, I-TAR, and around 40% more than the QTAR. We highlight our contributions against the state-of-the-art in Table 1. The rest of this paper is organized as follows. Section II, provides a discussion with regards to the previous efforts towards the state-of-the-art routing methods. In Section III, we describe our network model, followed by the description of our chosen benchmark algorithm. Then, in section IV, we explain our newly proposed algorithm, ITAR, as well as our approach for modelling node availability and the hybrid routing algorithm, namely HI-TAR. In Section V, we discuss the simulation environment and provide simulation results and their discussions. Section VI concludes the paper.

II. RELATED WORK
Many traffic-aware routing protocols have been proposed that make routing decisions by considering multiple traffic awareness-related metrics and as a result, they significantly reduced the failure probability of successful communications [9], [10], [16], [19], [20], [21]. A-STAR [21], for example, is a Global State Routing (GSR) based routing algorithm that relies on the information collected from bus routes to estimate probabilities for each road segment to provide sufficient coverage and implicitly a successful communication session. Unfortunately, such anchor vehicles only cover the road-network partly, which often leads to network partitioning and to extended end-to-end delay (EED). On the other hand, GyTAR [20] was designed to gather traffic data, such as node density between intersections by using cell data packets (CDPs). Unfortunately, besides the additional endto-end delay (EED) imposed by the use of CDPs, collecting real-time traffic information can prove to be difficult, mostly due to the network partitioning phenomena. A modified version of the Ad hoc On-Demand Distance Vector routing protocol (AODV) was proposed, namely Portable Fuzzy Constraints Q-learning AODV (PFQ-AODV) [19], which benefits from considering the direction of each vehicle along with communication channel estimations. On the other hand, due to the use of AODV, the 'broadcast storm' phenomena is experienced often, which leads to lower EED performance. Furthermore, QGRID [16] relies on collecting and using taxi data based on Shanghai's traffic, which, unfortunately, only suits a specific region.
Intersection-based routing was shown to be another performance enhancer for routing in [9]. The main reason behind this is that network congestion and interference issues are encountered more often at intersections, mostly due to their relatively increased vehicle density [12]. Moreover, nodes cannot easily communicate with other nodes placed on different road segments due to NLOS conditions. As a consequence, as seen in QTAR, routing algorithm performance can also be improved significantly by considering intersection data in the next-hop decision process [10]. Thus, RSUs are considered in urban implementations, such as iCar, iCar-II and QTAR and can assist when the 'coverage hole' issue is encountered [10], [13], [14]. In iCar and iCar-II [13], [14], the authors proposed an intersection-based traffic-aware routing protocol which aims to increase the PDR, while trying to minimise the EED. However, iCar-II requires real time global information on the infrastructure, which is challenging to obtain, especially in highly complex urban VNs. QTAR relies on the intersection-based geographic routing protocol that leverages fixed RSUs placed at the intersections to deliver the packets to the destination node. However, RSUs can easily transform into bottlenecks as they become unavailable due to high demand from other nodes. Such events should be taken into account when designing the recovery policy of the algorithm.
Unfortunately, the above-surveyed techniques are unable to provide efficient and highly reliable communications, since they cannot learn and adapt to the highly dynamic nature of the VNs. Thus, many routing protocols based on reinforcement learning have been proposed in recent years [9], [10], [15], [16], [17]. More specifically, QL-AODV [15] was designed for high mobility scenarios, where the algorithm predicts vehicular and link-state information using the Q-learning approach. However, this scheme uses a centralised routing approach, which makes it not appropriate for VNs. On the other hand, in PFQ-AODV [19], multi-hop links are evaluated based on the Q-learning algorithm. More specifically, the algorithm aims to choose the optimal next-hop, while considering per-link bandwidth, quality as well as changes in the node's speed and direction. However, the learning process and the route discovery process are performed simultaneously, and thus the rapid changes in the environment cannot be managed properly, leading to additional EED. Furthermore, QGRID [16] is a grid-based protocol which uses historical information to learn the environment in an offline manner in order to choose the next optimal grid. Unfortunately, the protocol performs poorly in a highly dynamic environment, as the offline populated Q-table can become outdated quickly. QTAR [10] uses different Q-learning techniques for Vehicle-to-Vehicle/Infrastructure (V2V/I) connections and Infrastructure-to-Vehicle/Infrastructure (I2V/I) connections to learn traffic conditions at each intersection, which leads to better routing performance than previous geographic routing protocols. Owing to the learning ability of the algorithm, after a specific time during which the learning is completed, QTAR is shown to perform better than the underlying geographic routing algorithms. Therefore, in this paper, we use QTAR as a benchmark that helps us highlight the contributions of our proposed schemes as discussed in the next section.

III. NETWORK MODEL
This section first describes our road-network model, then we provide an overview of our routing problem, together with our considered QoS metrics. This is followed by a description of the Q-learning based next-hop decision process as well as a short discussion on our chosen benchmark, QTAR. Moreover, all symbols used in this paper are provided in Table 2 along with their definitions.

A. ROAD-NETWORK MODEL
It is important to specify that each mobility model is specific to its scenario configuration due to their highly dynamic nature [22]. In our analysis, we consider and model a fixed Manhattan grid structure composed of vertical and horizontal two-lane roads which allow the motion in two directions as portrayed in Fig. 2 [23]. Traffic lights are placed at each intersection in the road-network for a more realistic modelling of the scenario. Then, with the aim of extending network coverage, we consider an RSU at each intersection in the grid. Note that only the vehicles communicate wirelessly while the static RSUs are assumed to be wired to the local infrastructure through wires as shown in Fig. 2. Finally, we exclude several traffic states that need to be considered separately.  For example, in scenarios such as traffic jams, vehicles stop moving and as a consequence, the wireless links established between them always stay active. Thus, only two lane roads are considered, which makes vehicle overtaking possible in order to avoid traffic jams for our vehicle density range.

B. ROUTING PROBLEM DESCRIPTION
We consider a randomly and uniformly distributed multisource multi-destination VN, with the aim to analyze the proposed routing techniques and compare them with the state-of-the-art. Each vehicle node follows the Random Waypoint Mobility Model (RWP) [24]. More specifically, vehicles are generated at random starting coordinates and then choose a target destination position. They then start moving towards the designated destination with a random uniformly distributed speeds. Note that, once the destination is reached, the vehicle chooses a new destination while this position becomes the new starting point. In our considered VN, nodes can establish successful communication links only with other nodes within their communications range. Thus, we consider V mobile vehicles that rely on the aid of multi-hop techniques to connect with each other, while making use of the fixed road-side unit (RSU) nodes placed at each intersection to extend the network coverage as described in Fig. 2. Each node is aware of its own coordinates through the use of a pre-installed Global Positioning System (GPS) as well as their neighbor mobility information, which can they attain and track through the regular exchange of HELLO packets [9]. The same process allows each vehicle to record Q-tables based on our link classification given the chosen QoS metrics. In more detail, vehicles record a Q-table for both other vehicles and the RSUs while the RSUs maintain two Q-tables each, one for vehicles and one for RSUs, which are later used in the routing decision process to find optimal source-destination communications paths.
Note that, for each source-destination vehicle (V s -V d ) pair, the information is being generated by the assigned source vehicle (V s ) solely and that the V s is also capable of directly transmitting to the destination vehicle (V d ), without requiring help from the fixed infrastructure. We assume that all nodes can transmit at the same power and that they are capable of perfect encoding and decoding. Moreover, we assume that there is no interference between communicating nodes in our simulations and we do not consider retransmission aspects of any nature at the medium access control (MAC) layer. The following subsection describes our QoS metrics.

C. OPTIMIZED QoS PERFORMANCE METRICS
Routing protocols aim to optimize different performance aspects of a communications system. Thus, ensuring that the QoS requirements are met remains the most critical purpose behind the routing algorithm design [9], [25]. Our chosen QoS targeted requirements are elaborated in this section, namely end-to-end delay (EED), link quality (LQ) and link expiration time (LET). Note that many other general performance factors, such as the variation of mobility, loop avoidance, node degree, multipath information and others have been considered by the literature [9].
By optimizing the EED, a routing protocol aims to achieve successful transmission between the N s and N d in the shortest time possible [26]. For example, such techniques are used in both QTAR and PP-AODV. There are various types of delay, namely caused by propagation, queuing and internal processing. For example, once a reliable connection (route) is established, multiple nodes will attempt to use it for packet forwarding, which in turn leads to queuing delay. Our per-link EED metric calculation can be evaluated as [26]: where dist N c ,N i represents the distance between the current vehicle, N c , and another vehicle node, N i . c is the speed of electromagnetic radiation propagation in free space, while Ps is the packet size and Tr represents the available transmission rate. Note that the transmission rates for V2V/I links differ from that of I2I links as the RSUs are wire-connected [22], [27]. Connection reliability is another factor shown to be of great importance when it comes to successfully routing packets throughout the network. Hence, protocols such as QTAR and PFQ-AODV estimate and ensure the reliability of a link between two nodes before considering it for packet forwarding. Unfortunately, in order to verify the reliability of a link, the two nodes defining it are required to exchange an increased number of control packets, which can implicitly lead to additional overhead. Eq. (2) below describes how we calculate our connection reliability metric [10]: where dist N c ,N i represents the two-dimensional Euclidean distance between the current node, N c , and any other node in the VN, N i . Cr represents the wireless transmission range while the parameter K represents the optimal normalized distance position with respect to Cr. Finally, the decision on the most stable link is optimised based on the computation of the link expiration time (LET) of each path. More specifically, each receiving intermediate node can then use the information received from its neighbors as well as its own local mobility information to calculate the LET of each communications link [28]. Note that, in this paper, the path with the maximum LET is considered to be the most stable. The LET calculation follows [29]: where in which v N c and v N i represent the velocities of nodes N c and N i with the velocity angles of N c and N i and the coordinates x N c , y N c and x N i , y N i , respectively.

D. TRAFFIC-AWARE ROUTING BASED ON REINFORCEMENT LEARNING FOR URBAN VANETS
Q-learning is a reinforcement learning technique that attempts to maximise the current state's reward for interacting with it environment towards the goal state [9], [19], [30], [31], [32]. In the context of VNs, each packet sent into the network can be modelled as an agent. If the current node (N c ) is considered to be the current state of the agent, then the pool of available states it can interact with takes the form of N c 's neighbors. More explicitly, after initialising the Q-table to 0, N c updates it through regular exchange of HELLO packets with its neighbors [33]. The Q-value evaluation upon receiving an update from N i can be formulated as [33] (5), as shown at the bottom of the next page, where R N c ,N i is the obtained reward of N c from the action of packet forwarding to N i for V2V Q-learning, which can be evaluated as Eq. (6) below: where w 1 , w 2 and w 3 are weight factors that correspond to the QoS metrics, namely link quality (LQ), link expiration time (LET ) and end-to-end delay (EED), respectively, as defined in [10]. Note that w 1 + w 2 + w 3 = 1, while w 1 , w 2 and w 3 can be adjusted in order to represent different QoS requirements for different applications.
The learning rate, α, serves as a Q-value update rate at each step of the learning process, which determines the ability of the learning algorithm to adapt to the changes in the environment. For example, if α is too small, the algorithm cannot keep up with changes in the network. On the other hand, a large value of α will lead to large fluctuations in the Q-values even when changes in the environment are insignificant.
The discount factor, γ , establishes how relevant rewards in the distant future are, considering changes in the environment in the immediate future. In more detail, a larger γ is favorable in a static environment, while in a highly dynamic one, i.e, a VN, a smaller value is preferred. Next, we describe the routing approach as described by Wu et al. in QTAR for the purpose of using it as a benchmark algorithm.

E. QTAR-A BENCHMARK ALGORITHM
Following the approach in QTAR [10], if a vehicle V i generates or receives a packet Pk to send towards the destination, it employs V2V/I Q-learning until Pk can reach V d . More specifically, the path formed can pass through both vehicles and the infrastructure. However, in QTAR the RSUs are prioritised and thus, they guide the path towards the intended destination. Note that the Q-learning technique is employed separately for RSUs and the vehicles. More specifically, intersections are dynamically selected using V2I Q-learning while the next-hop vehicle nodes are chosen using V2V Q-learning. Unfortunately, there are several drawbacks to be dealt with in QTAR. Firstly, we commence with the hypothesis that prioritising traffic through the RSUs does not always guarantee the optimal routing solution as the packets will try to reach an RSU even if a better V2V path is available for use. Secondly, as multiple V s -V d pairs intend to communicate simultaneously at each time instant [10], nodes get involved multiple communications sessions and, as a consequence, they encounter queuing issues which can make them unavailable for a new routing attempt. This shows us that QTAR did not touch upon the availability metric of nodes, when making the next-hop decision which often leads to non-existing routes and implicitly to low PDRs. Therefore, in addition to traffic condition, routing techniques should also consider the availability of both vehicular nodes and the RSUs. Hence, in our work we contest the traditional intersection-aware routing approach and propose a new hybrid routing technique that chooses the best V2V/I route while also considering the availability of each node once it takes part in an active communication link. We develop on these two points in the following section.

IV. HYBRID INFRASTRUCTURE-AIDED TRAFFIC-AWARE ROUTING
In this section, we explain our proposed intersection-based routing approach. More specifically, we describe the novel functionalities of our proposed network model, i.e. in I-TAR. Finally, we show our availability modelling approach and we describe our proposed hybrid routing technique, HI-TAR. There are several studies in the available literature that suggest that RSU -assisted networks can prove superior to traditional V2V-only routing protocols in terms of overall performance [34], [35], [36], [37], [38], [39]. Thus, in our newly proposed scheme, RSUs are placed in critical positions, namely at each intersection, with the purpose of integrating them into both wireless and wired data transmission. Thus, vehicles that have data to send towards the destination can do so using both V2V wireless links to reach other vehicles or V2I wireless links to connect to the static infrastructure. Note that wired networks can achieve better reliability and transmission speeds over longer distances through I2I wired links when compared to its unstable wireless counterpart [23], [39]. Hence, the RSUs can be always connected as well as aware of all other RSUs in the network, due to periodic signaling exchange using the wired network. Moreover, given the static nature of the RSUs, there is no motivation for using RL-aided techniques for routing the traffic between the RSUs. Hence, a simple GSR-based shortest-path algorithm is employed initially for generating the routes that traverse the wired I2I infrastructure which are then stored in lookup tables (LUTs) at each of the RSUs such that they can be accessed at any time. As a consequence, we prioritise RSUs in the next-hop decision process of the vehicles in order to exploit the reliable and fast wired connections inside the static infrastructure.
In more details, in I-TAR, if a vehicle V i has a packet Pk to send towards the destination, it employs V2V/I Q-learning until Pk reaches the optimal RSU of V i , RSU V i best , or its final destination vehicle, V d . Each vehicle knows the location and status of every RSU within the road network. Therefore, RSU V i best denotes the best next-hop available RSU for vehicle V i based on the employed QoS performance metric. Then, RSU V i best reroutes Pk through the wired infrastructure until the best RSU for V d is found, namely RSU V d best [40]. Once the packet has arrived at RSU V d best , V2V/I Q-learning is employed once again to reach V d . Once the static infrastructure is reached, the data is taken towards V d over a separate infrastructure-to-infrastructure (I2I) wired channel until RSU V d best is reached. Finally, RSU V d best transmits the data to V d through infrastructure-to-vehicle (I2V) and V2V wireless channels. Hence, drivers can acquire the services they need by accessing the network at any time.
In other words, when a vehicle V i has data to send, the routing problem can be simplified to the extent of just achieving successful communication with the local wired infrastructure, by reaching their optimal RSU , RSU V i best , rather than doing routing inefficiently among all nodes in the network towards the destination vehicle, V d . We portray I-TAR generated routes against pure V2V routes in Fig. 2. In addition, the pseudo-code of the I-TAR forwarding process at each node is given in Algorithm 1.

B. MODELLING NODE AVAILABILITY FOR MULTIPLE V s -V d PAIRS
Queuing delay at each node along the route is critical for delay-sensitive message dissemination in VNs, especially when multiple sources and destinations attempt to communicate in urban scenarios with complex street conditions and high traffic demand [41], [42]. If multiple V s -V d pairs attempt to communicate at the same time, some nodes experience limited-availability (LA) as they are required to manage different queue lengths when several flows reach them simultaneously. More specifically, the packets are stored at each node until a successful communication route is provided. This can lead to queue overflowing, which translates to significant additional EED. It is important to note that a connection is considered successful only if the information is not out-dated on arrival at the destination or otherwise, the packet is dropped. Thus, nodes suffering from LA cannot provide successful communications and, as a consequence, considering the queue state at each node is of critical importance during the next-hop selection process. Therefore, in this paper, we introduce a new variable, namely the node availability, with the aim of separating LA scenarios from their full-availability (FA) counterparts by modelling the problem induced by the packet queuing phenomena. We define the availability status of a node based on whether its maximum queue capacity was reached or not. More specifically, as nodes take part in several different communication paths at a time, once the queue overflow state is reached, they become overloaded for the other transmission attempts. The I-TAR section in Fig. 2 describes the LA scenario, as the pair of red vehicles are already involved in a communication path, and hence they are not available for future routing sessions. Algorithm 2 presents how we keep track of node availability at each time-step as several V s -V d pairs attempt to communicate.

C. HYBRID INFRASTRUCTURE-AIDED TRAFFIC-AWARE ROUTING (HI-TAR)
HI-TAR is proposed as an extension of I-TAR which takes into account the availability, i.e. the LA/FA condition, of the

Discount rate
Maximum predicted reward, given new state and all possible actions

Algorithm 1: I-TAR Next-Hop Routing Decision Requirements:
→ Pk : A packet that is transmitting in the network. → V i : A vehicle node. → V d : The destination vehicle of Pk. → V c : The current vehicle that is processing Pk. → NB V i : The set of neighbor nodes of V i . → RSU i : An RSU node. → RSU c : The current RSU that is processing Pk. → NB RSU i : The set of neighbor nodes of RSU i .
Upon V i having a packet Pk to forward to V d : Pk has reached its destination; Forward Pk to RSU V s best using V2V RL-aided routing; end Upon RSU i having a packet Pk to forward to V d : best , the best available intermediary destination intersection of V d ; → Next-hop decision: if RSU c = RSU V d best then Send Pk to V d using V2V Q-learning routing; else if V d ∈ NB RSU i then Send Pk directly to V d ; else Forward Pk to RSU V d best using a simple I2I GSR-based routing; end → Output: PATH nodes as they take part in multiple communications paths and try to provide alternate V2V routes by taking advantage of the high number of available vehicles spread across the VN, as portrayed in Fig. 2. To elaborate further, when RSUs are prioritised for packet forwarding, they become unavailable faster than other nodes on average. Eventually, this leads to network partitioning, due to the low number of available RSUs still under demand. Therefore, it is vitally important to optimise the decision on whether the data should be relayed to the RSU or not. In Algorithm 3, we introduce a hybrid technique that does not always prioritise the RSUs but also considers the best V2V path available as an alternative. As a consequence, we do not have to consider queuing at the RSUs alone anymore but at all nodes. Due to the increased number of nodes spread across the network, whenever an RSU is not available for a communication session, the nodes can alternately use V2V links. Our hybrid approach for HI-TAR is shown in Algorithm 3. Note that the scheme can be applied to other routing schemes that are RSU dependant such as QTAR.
In the next section, we provide the performance analyses of the aforementioned schemes.

V. SIMULATION RESULTS AND DISCUSSIONS
We first describe our simulation environment, then the routing performance analyses of I-TAR and HI-TAR are provided against the literature while varying several factors such as the number of vehicles and the number of V s -V d active pairs. Furthermore, for both variations, we compare the schemes under a more realistic scenario, namely while modelling and considering node availability. VOLUME 10, 2022

A. SIMULATION ENVIRONMENT
For the purpose of realistic evaluation in terms of both communications and mobility, a VN platform is simulated by allowing Simulation of Urban MObility (SUMO) and MATLAB to communicate with each other through Traffic Control Interface for MATLAB (TraCI4Matlab) [43]. SUMO is used to generate movement traces for vehicles and form mobility models which are then used as data for routing performance evaluations in MATLAB. Thus, we model a 3000m × 3000m grid which we divide into nine 1000m × 1000m smaller grids by placing intersections at 1000m from each other as shown in Fig. 2. Finally, vehicles are randomly and uniformly distributed over the available roadnetwork. Each vehicle can decelerate to 0 m/s and accelerate to 13.89 m/s as defined by the urban speed limit. Note that velocities are normally distributed with a standard deviation of 0.1, as defaulted by SUMO [44].
The performance of the proposed HI-TAR algorithm is studied in the following section under different simulation parameters such as number of vehicles and number of V s -V d pairs against its bench-markers, namely I-TAR and Q-TAR. More specifically, at each time-step, we randomly select a number of V s -V d pairs that attempt to communicate with each other wirelessly. Generally, the simulation time is set to 1000 s, the maximum number of V s -V d pairs is set to 20 and the maxSpeed is set to 10 m/s. Moreover, the Q-learning parameters α and γ are set to 0.8 and to 0.9, respectively.

B. APDR AND AEED VS. THE NUMBER OF VEHICLES ANALYSES
We commence with the presentation of how our proposed techniques, namely I-TAR and HI-TAR, perform against  the literature while varying the number of vehicles in the VN. Then, we apply the hybrid approach (provided in Algorithm 3) to QTAR, namely Hybrid QTAR, to reveal how it can improve any RSU dependant routing technique. More precisely, we analyse the APDR and AEED provided by the routing algorithms for 20 V s -V d active pairs while increasing the number of vehicles from 50 to 500. We first look at idealised case where the availability of the nodes is not modelled and thus, highlight the benefits of our proposed schemes in the same scenario considered by QTAR in [10]. Then, we model availability as described in Algorithm 2 and show how different schemes perform under a more realistic configuration.

1) FULL-AVAILABILITY (FA)
With no node availability considerations, it is clear from Figure 3 that increasing the number of vehicles participating in the network from 50 to 500 leads to better APDR performance. This can be interpreted by the fact that the probability of network connectivity is directly proportional to the number of vehicles in the network. More specifically, as vehicles populate the network, the 'coverage hole' issue fades as there are more 'next-hop' options to choose from. Furthermore, it can be observed that I-TAR already provides a much better packet delivery performance, achieving up to 18.9% higher APDR than Q-TAR for 500 vehicles. This is mainly because in I-TAR, the routing problem is reduced to just being able to connect the V s and V d to any RSU. More specifically, once the packets reach an RSU, the wired infrastructure takes them over directly from the V s side of the network towards the V d without the need to route them through all other available vehicles. It is important to note that in this case packets also benefit from a more reliable path as the RSUs are static and thus, the links between them do not break due to mobility.
Then, we show that by using the hybrid approach, APDR can be improved further for both I-TAR and QTAR. To elaborate, hybrid QTAR achieved up to 4.7% higher APDR than QTAR while HI-TAR improved I-TAR's APDR performance by 4%. In this case, where the availability of the nodes is not considered, the performance is slightly better simply because nodes are provided with a backup option when the RSUs are unreachable. Figure 4 depicts the AEED for the 4 schemes presented above while varying the number of vehicles from 50 to 500 and maintaining the number of V s -V d pairs at 20. It can be observed that due to the use of the delay metric considered in the next-hop decision process, all 4 schemes provide good AEED performance, in the range of 0.21-0.27 s. As both hybrid QTAR and I-TAR can be seen as improvements to QTAR, it is expected that they will both outperform the benchmark described in Sec. III-E. More specifically, it is expected that hybrid QTAR performs better than QTAR in terms of EED as the approach introduces shorter V2V-only routes that are not forced to always go through the infrastructure. Moreover, as I-TAR relies on the separate wired infrastructure, considerably less-hopped routes are generated, since I2I links can cover much larger areas of the VN than regular V2V/I links. On average, I-TAR provides 2.4% lower EED than QTAR as less hops are used to reach the destination. Moreover, as all nodes are considered available, the hybrid scheme only provides a shorter path between the V s and the infrastructure, and between the infrastructure and the V d . More specifically, HI-TAR improves the AEED performance of I-TAR by 7.6% while applying the hybrid scheme improves the AEED performance of QTAR by 9.9%.

2) LIMITED-AVAILABILITY (LA)
We then present the results of the aforementioned schemes, while modelling the node availability based on each node's queue state. Simulations were performed while varying the number of vehicles from 50 to 500 for 20 V s -V d pairs. It can be observed that the APDR performance drops significantly under such conditions especially in the case of I-TAR and QTAR. This is expected mainly because in both of these schemes, the infrastructure is always prioritised in the 'nexthop' decision process and, as a consequence, the queues at the RSUs end up chocking other attempts to communicate through the network. It can be observed from Figure 5 that, in limited-availability cases, HI-TAR, I-TAR and QTAR provide a 18.2%, 50.7% and 47.8% smaller APDR than their full-availability counterparts. However, by being given an alternative to non-available infrastructure-based paths, HI-TAR manages to achieve an APDR around 42.1% higher than I-TAR and around 49.6% than QTAR. Moreover, when  availability is modelled, HI-TAR achieves 4.7% more APDR than the no availability scenario of QTAR.
Furthermore, it can be seen from Figure 6 that the delay-focused metric provides a good overall AEED performance. Moreover, a trade-off is revealed, when analysing the limited-availability scenarios. More explicitly, QTAR and I-TAR provide 6% and 11.5% lower delay than their full-availability counterparts. However, when we consider node availability, HI-TAR provides 2.9% higher delay than its full-availability scenario. If the hybrid approach is not used, the packets are simply dropped when encountering an unavailable node while the targeted delay does not change for successful paths. However, the hybrid approach is able to achieve a higher APDR at the cost of a small increase in AEED. More specifically, for hybrid scenarios, when a node is not available, the routing algorithm will look to work-around the said node by taking a longer alternative path. Hence, this will lead to a successfully transmitted packet but rather over a longer delay-induced path.

C. APDR AND AEED VS THE NUMBER OF V s -V d PAIRS ANALYSIS
Finally, we fix the number of vehicles at 300 and analyse our proposed schemes in both FA and LA scenarios while varying the number of V s -V d pairs attempting to communicate with each other at each time-step of our simulation. This analysis reveals the effect of multiple V s -V d pairs on node availability and implicitly on our routing performance. Initially, we looked at the full-availability scenarios, where VOLUME 10, 2022  no availability is considered. As expected, there is no effect on routing performance as we increase the number of V s -V d pairs. Nodes are always considered available, which means they will always be considered in the routing decision process as potential next-hops. However, the APDR of all schemes drops significantly for the limited-availability scenario. More specifically, nodes become involved in several communication sessions, making them unavailable for other communication paths. The more V s -V d pairs we consider, the less packets end up at the intended destination. It can be seen in Fig. 7 that in LA scenarios, HI-TAR, I-TAR and QTAR achieve 72.2%, 82.3% and 81.9% less packets successfully delivered, respectively, when compared to their FA counterparts for 100 active V s -V d pairs. However, out of the three mentioned schemes, HI-TAR still provides the best APDR performance. More specifically, HI-TAR achieves 34.6% more packets successfully transmitted than its I-TAR counterpart, and around 39.6% more than QTAR. This proves that the hybrid approach helps when multiple sources attempt to connect to multiple destinations, as it provides alternative paths, in order to assure successful and reliable communications across the network. Fig. 8 depicts the AEED for QTAR, I-TAR and HI-TAR for the same setup. Again, we can observe the effect of the delay-oriented metric used in our routing algorithm as all three schemes perform well. To elaborate, among the fullavailability scenarios, HI-TAR is able to achieve the lowest AEED, precisely 6.8% less than QTAR and 4.1% less than I-TAR. The reasoning behind this stands again in the fact that the hybrid technique provides the algorithm with the freedom of choosing the best path based on our chosen metric rather than restricting traffic to always pass through the infrastructure. However, when node availability is considered, all three schemes perform better than in the previous scenarios. This is expected as most longer paths are dropped due to the unavailable participant nodes within. Note that the more hops a path has, the higher the chance to encounter an unavailable node is. Hence, longer paths, with higher EED implicitly, are considered unsuccessful which leads to a smaller delay overall at the cost of achieving significantly low PDR. Moreover, the trade-off between APDR and AEED for the hybrid approach is observable once more here. With availability modelled, I-TAR provides 13.4% lower AEED than in QTAR. However, for the same scenario, HI-TAR provides approximately the same AEED performance as QTAR, namely 13.5% higher AEED than I-TAR. More specifically, once the hybrid scheme is introduced, the delay increases slightly while the APDR increases.

D. PROCESSING COMPLEXITY DISCUSSION
This subsection briefly discusses the processing complexity of routing in VNs at both node and network level based on the processing of the routing approach employed. Therefore, since the most processing power is needed during the decision making stage of the transmission process this discussion aims to consider the total number of RL-aided routing processing against the total number of transmission links in a benchmark route. However, as discussed in Section IV-A, RL-aided routing is only needed for V2V, V2I and I2V transmissions while wired I2I transmissions become negligible in that sense as the road-side infrastructure is static and each RSU is assumed to already have pre-calculated lookup tables (LUT) which store the ideal I2I routes to all other RSUs in the VN. Thus, as vehicles are reached by packets, they will always perform RL-based routing when attempting to find a next-hop node to transmit them towards. On the other hand, the RSUs only do so when they transmit the packets to other vehicles but not to other RSUs as I2I transmissions are performed by the separate wired I2I infrastructure.
Having that in mind, the following processing complexity comparison between HI-TAR and its predecessors, I-TAR and QTAR, is provided, based on a few route examples as presented in Figure 9. In QTAR all nodes perform V2V/I RL-based routing all the time, as there is no separate wired I2I infrastructure to rely on. As a consequence, for a given 5 nodes QTAR route, the first 4 nodes have to process the packets and decide on a next-hop through RL-aided routing, similarly to a V2V-only route. On the contrary, I-TAR takes advantage of the wired I2I infrastructure and thus, the RL-aided routing processing is only required at the vehicles, until the I2I infrastructure is reached, namely RSU best V s , and once again at RSU best V d , where the packets leave the I2I wired infrastructure and are forwarded through vehicle next-hop nodes as they try to reach V d . Thus, for a given 5 node route, the processing complexity an I-TAR route can be as low as 2 RL processes at 2 vehicle nodes, as only two V2V/I links are required while the wired RSU infrastructure can assure network connectivity and is thus employed with that aim. Note that, in I-TAR, occasionally, if V s is closer to V d than to RSU best V s , the wired I2I infrastructure is not needed, and I-TAR will produce some short-hopped V2V-only paths and thus, similar processing complexity to QTAR or V2V-only routing approaches. However, in most cases, especially as the dimensions and the complexity of the road-network structure are scaled up following realistic urban VN scenarios, the wired I2I infrastructure becomes critical as there is a much higher chance that V s -V d will not reach each other without making use of the RSUs. Finally, as depicted in Figure 9, in HI-TAR, some of the successful routes are gathered through V2V-only RL-based routing while some are similar to I-TAR, as both options are provided through the hybrid technique. As a consequence, V2V-only routing can be considered the processing complexity lower-bound for HI-TAR while I-TAR provides the best performance it can achieve in that sense. Therefore, considering a 5 node V2V/I route is required to assure connectivity between V s and V d , HI-TAR can achieve a processing complexity up to 4, for 4 V2V/I links, but it can go to as low as 2, for 2 V2V/I links, as provided by I-TAR. Moreover, V2V/I paths are found to be a lot longer than I-TAR generated routes in terms of number of hops as the infrastructure assures coverage across the VN and hence, a multitude of V2V/I links is not needed. For example, let us assume that V s requires a 100 links V2V/I route to reach V d across the proposed VN while using QTAR or V2V-only routing. I-TAR, on the other hand, could lower the processing complexity to as little as 2. More specifically, in the idealised scenario presented above, the RL-processing is required for 2 V2V/I links only as V s is within range of RSU best V s and RSU best V d is within range of V d . In more details, in such a scenario, the only two RL-based routing processes are performed between V s and RSU best V s and between RSU best V d and V d as shown in Figure 9 while the processing complexity of the I2I links can be ignored as it relies on pre-configured static routing. However, this was merely a discussion based on the functionality of our algorithms. Further complexity analysis are needed for the proper evaluation of routing in VNs which enters under the scope of this paper's future work.

VI. CONCLUSION
In this paper, we first proposed a new vehicular routing algorithm, named Infrastructure-aided Traffic Aware Routing (I-TAR) which uses the static wired RSU infrastructure for packet forwarding. We also proposed a new hybrid approach, namely, Hybrid Infrastructure-aided Traffic Aware Routing (HI-TAR), which aims to solve the multi-source, multi-destination problem and the effect this imposes on node availability. Moreover, we applied the hybrid approach to the state-of-the-art algorithms and showed how RSU dependent routing algorithms can be improved through its use. Against these adaptations, we then examined the effect of varying several critical parameters for the routing performance of VNs. More specifically, we looked at the APDR and AEED performance of the routing algorithms while varying the number of vehicles in the network, as well as the number of active V s -V d pairs to better reveal the challenges imposed on node availability. We approved the effectiveness of our hybrid approach in terms of APDR and AEED performances through extensive simulations. As a future work, we will consider interference, packet collisions and retransmission techniques as well as further computational complexity analysis. MOHAMMED EL-HAJJAR (Senior Member, IEEE) received the Ph.D. degree in wireless communications from the University of Southampton, U.K., in 2008. Following the Ph.D. degree, he joined Imagination Technologies, as a Design Engineer, where he worked on designing and developing Imagination's multi-standard communications platform, which is used in many mobile and communications devices. He is currently an Associate Professor with the School of Electronics and Computer Science, University of Southampton. He has published a Wiley-IEEE book and in excess of 100 journals and conference papers. His research interests include the development of intelligent communications systems, energy-efficient transceiver design, MIMO, millimeter-wave communications, and radio over fiber network design. He was a recipient of several academic awards. VOLUME 10, 2022