Skip to Main Content
This paper presents a policy gradient semi-Markov decision process (SMDP) algorithm for call admission control and routing functions in an integrated network. These systems must handle several classes of calls of different value and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a SMDP problem, but is too complex to allow for an exact solution. Thus, a policy gradient algorithm is proposed to find the optimal call admission control and routing policy among a parameterized randomized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate average gradient of the average reward (GSMDP), using only single sample path of the underlying Markov chain for the SMDP of call admission control and routing problem. The experimental simulations will compare its performance with other methods show the robustness of our algorithm.