Skip to Main Content
A learning controller is presented for a Markovian decision problem in which the transition probabilities are unknown. This controller, which is designed to be asymptotically optimal with consideration of a conflict between estimation and control, uses a performance criterion incorporating a tradeoff between them explicitly for determination of a control policy. It is shown that this controller achieves asymptotic optimality in the sense that the relative frequency of applying the optimal policy converges to unity.
Date of Publication: Nov 1985