By Topic

Recursive Algorithms for Adaptive Control of Finite Markov Chains

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)

The problem of controlling a finite Markov chain so as to maximize the long-run expected reward per unit time is studied. The chain's transition probabilities depend upon an unknown parameter taking values in a subset [a,b] of Rn. A control policy is defined as the probability of selecting a control action for each state of the chain. Derived is a Taylor-like expansion formula for the expected reward in terms of policy variations. Based on that result a recursive stochastic gradient algorithm is presented for the adaptation of the control policy at consecutive times. The gradient depends on the estimated transition parameter which is also recursively updated using the gradient of the likelihood function. Convergence with probability 1 is proved for the control and estimation algorithms.

Published in:

Systems, Man and Cybernetics, IEEE Transactions on  (Volume:11 ,  Issue: 2 )