Asymptotically efficient adaptive allocation schemes for controlledMarkov chains: finite parameter space
Agrawal, R.; Teneketzis, D.; Anantharam, V.
Automatic Control, IEEE Transactions on
Volume 34, Issue 12, Dec 1989 Page(s):1249 - 1259
Digital Object Identifier 10.1109/9.40770
Summary:The authors consider a controlled Markov chain whose transition
probabilities and initial distribution are parametrized by an unknown
parameter θ belonging to some known parameter space Θ.
There is a one-step reward associated with each pair of control and the
following state of the process. The objective is to maximize the
expected value of the sum of one-step rewards over an infinite horizon.
The loss associated with a control scheme at a parameter value is the
function of time giving the difference between the maximum reward that
could have been achieved if the parameter were known and the reward
achieved by the scheme. Since it is impossible to minimize the loss
uniformly for all parameter values, the authors define uniformly good
adaptive control schemes and restrict attention to these schemes. They
develop a lower bound on the loss associated with any uniformly good
control scheme. They construct an adaptive control scheme whose loss
equals the lower bound for every parameter value and is therefore
asymptotically efficient
View citation and abstract |