By Topic

An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
M. Sato ; Tohoku University, Sendai, Japan ; K. Abe ; H. Takeda

A learning controller is presented for a Markovian decision problem in which the transition probabilities are unknown. This controller, which is designed to be asymptotically optimal with consideration of a conflict between estimation and control, uses a performance criterion incorporating a tradeoff between them explicitly for determination of a control policy. It is shown that this controller achieves asymptotic optimality in the sense that the relative frequency of applying the optimal policy converges to unity.

Published in:

IEEE Transactions on Automatic Control  (Volume:30 ,  Issue: 11 )