Skip to Main Content
In many urban areas where traffic congestion does not have the peak pattern, conventional traffic signal timing methods does not result in an efficient control. One alternative is to let traffic signal controllers learn how to adjust the lights based on the traffic situation. However this creates a classical non-stationary environment since each controller is adapting to the changes caused by other controllers. In multi-agent learning this is likely to be inefficient and computationally challenging, i.e., the efficiency decreases with the increase in the number of agents (controllers). In this paper, we model a relatively large traffic network as a multi-agent system and use techniques from multi-agent reinforcement learning. In particular, Q-learning is employed, where the average queue length in approaching links is used to estimate states. A parametric representation of the action space has made the method extendable to different types of intersection. The simulation results demonstrate that the proposed Q-learning outperformed the fixed time method under different traffic demands.