Skip to Main Content
Resource allocation is an important issue in cognitive radio systems. It can be done by carrying out negotiation among secondary users. However, significant overhead may be incurred by the negotiation since the negotiation needs to be done frequently due to the rapid change of primary users' activity. In this paper, an Aloha-like spectrum access scheme without negotiation is considered for multi-user and multi-channel cognitive radio systems. To avoid collision incurred by the lack of coordination, each secondary user learns how to select channels according to its experience. Multi-agent reinforcement leaning (MARL) is applied in the framework of $Q$-learning by considering other secondary users as a part of the environment. A rigorous proof of the convergence of $Q$-learning is provided via the similarity between the $Q$-learning and Robinson-Monro algorithm, as well as the analysis of the corresponding ordinary differential equation (via Lyapunov function). The performance of learning (speed and gain in utility) is evaluated by numerical simulations.