Skip to Main Content
An important practical constraint on admissible control policies is defined for the Markov decision process. The framework of an algorithm based on the infinite return optimization algorithms of Howard and Jewell is suggested to compute the optimal policy under this constraint. Iterative convergence to the optimal policy cannot be guaranteed, but techniques proposed for state-space reduction and rapid resolution of undetermined policies should render many problems tractable.
Systems, Man and Cybernetics, IEEE Transactions on (Volume:SMC-1 , Issue: 1 )
Date of Publication: Jan. 1971