Skip to Main Content
In traditional cognitive radio (CR) network, secondary users (SUs) are always assumed to obey the rule of “introducing no interference to the primary users (PUs)”. However, this assumption may be not realistic as the CR devices becoming more and more intelligent nowadays. In this paper, with the concept of light-handed CR, which is proposed to deal with the above mentioned problem by enforcing “punishment” to illegal CR transmissions, we model the action decisions of primary users (PUs) as a partially observable Markov decision process (POMDP), and propose the optimal spectrum allocation scheme with the objective of maximizing their reward. Utility function is defined as the reward in this paper as well. Furthermore, extensive simulation results show that the proposed scheme improves the reward significantly compared to the existing scheme.