This note presents a formal method of improving a given base-policy such that the performance of the resulting policy is no worse than that of the base-policy at all states in constrained stochastic dynamic programming. We consider finite horizon and discounted infinite horizon cases. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy
Published in:
Automatic Control, IEEE Transactions on
(Volume:51
,
Issue:
9
)
Date of Publication: Sept. 2006