Skip to Main Content
Multi-hop wireless networks can provide flexible network infrastructures at a low cost. However, most existing wireless networking solutions are designed for delay-insensitive applications, thereby resulting in poor performance when handling delay-sensitive applications. Traditionally, network design problems are formulated as static optimizations, by assuming the network characteristics remain static. However, these solutions are not optimal when the environments are dynamic. Recently, several research works apply machine learning to maximize the performance of multi-hop wireless networks in dynamic environments, but they either only focus on determining policies at the network layer, without considering the lower-layers' actions, or use centralized learning approaches, which are inefficient for delay-sensitive applications, due to the large delay when propagating messages throughout the network. We propose in this paper a new solution that enables the nodes to autonomously determine their routing and transmission power to maximize the network utility, in a dynamic environment. We formulate the problem as a Markov Decision Process, and propose a distributed computation of the optimal policy. Moreover, we use reinforcement-learning to find the optimized policy when the dynamics are unknown. We explicitly consider the impact of the information overhead on the network performance, and propose several novel algorithms to reduce the information overhead.