Skip to Main Content
We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.