A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints | IEEE Journals & Magazine | IEEE Xplore

A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints


Abstract:

Providing provable performance guarantees in vehicular network routing problems is crucial to ensure safely and timely delivery of information in an environment character...Show More

Abstract:

Providing provable performance guarantees in vehicular network routing problems is crucial to ensure safely and timely delivery of information in an environment characterized by high mobility, dynamic network conditions, and frequent topology changes. While Reinforcement Learning (RL) has shown great promise in network routing, existing RL-based solutions typically support decision-making with either peak constraints or average constraints, but not both. For network routing in intelligent transportation, such as advanced vehicle control and safety, both peak constraints (e.g., maximum latency or minimum bandwidth guarantees) and average constraints (e.g., average transmit power or data rate constraints) must be satisfied. In this paper, we propose a holistic framework for RL-based vehicular network routing, which maximizes routing decisions under both average and peak constraints. The routing problem is modeled as a Constrained Markov Decision Process and recast into an optimization based on Constraint Satisfaction Problems (CSPs). We prove that the optimal policy of a given CSP can be learned by an extended Q-learning algorithm while satisfying both peak and average latency constraints. To improve the scalability of our framework, we further turn it into a decentralized implementation through a cluster-based learning structure. Applying the proposed RL algorithm to vehicular network routing problems under both peak and average latency constraints, simulation results show that our algorithm achieves much higher rewards than heuristic baselines with over 40% improvement in average transmission rate, while resulting in zero violation in terms of both peak and average constraints.
Published in: IEEE Transactions on Vehicular Technology ( Volume: 72, Issue: 5, May 2023)
Page(s): 6753 - 6764
Date of Publication: 11 January 2023

ISSN Information:

Funding Agency:


I. Introduction

Vehicular networks as a key enabler for intelligent transportation have received growing attention from both industry and academia in recent years [1], [2]. It is expected that an unprecedented amount of data will be shared through real-time communications between vehicles and infrastructure to support various new services such as advanced vehicle control and safety. Traffic routing in vehicular networks that are characterized by high-mobility nodes, dynamic channel conditions, and frequent topology changes requires solving a challenging online optimization problem [2], [3], [4], [5], [6], [7], [8], [9], [10]. To this end, learning techniques – especially reinforcement learning (RL) – have been employed for online decision making in vehicular network routing problems and showed great promise [11], [12], [13], [14].

Contact IEEE to Subscribe

References

References is not available for this document.