Skip to Main Content
Consider a controlled Markov chain whose transition probabilities depend upon an unknovn parameter Â¿ taking values in finite set A. To each a is associated a prespecified stationary control law Â¿(Â¿). The adaptive control lay selects at each time t the control action indicated by Â¿(Â¿t) where Â¿t is the maximum likelihood estimate of Â¿. It is shown that Â¿t converges to a parameter Â¿* such that the 'closed loop transition probabilities corresponding to Â¿* and Â¿(Â¿*) are the same as those corresponding to Â¿0 and Â¿(Â¿*) where Â¿0 is the true parameter. The situation vhen Â¿0 does not belong to the model set A is briefly discussed.