The policy iteration algorithm for average reward Markov decision processes with general state space

The policy iteration algorithm for average reward Markov decision processes with general state space | IEEE Journals & Magazine | IEEE Xplore