A learning scheme is presented for a Markovian decision problem with estimation of unknown transition probabilities which are dominated by a periodically varying parameter with period T. According to this scheme, at every T time instant the unknown parameter is estimated and then a policy sequence to be applied at the next T time instant is determined. It is shown that the estimate converges to the true value as time evolves and accordingly this scheme asymptotically attains control which is β-optimal in a broad sense
Published in:
Decision and Control, 1990., Proceedings of the 29th IEEE Conference on
Date of Conference:
5-7 Dec 1990
- Page(s):
-
1457
-
1458 vol.3
- Meeting Date :
-
05 Dec 1990-07 Dec 1990
- INSPEC Accession Number:
-
4038127
- Conference Location :
-
Honolulu, HI
- Digital Object Identifier :
-
10.1109/CDC.1990.203852
- Product Type:
-
Conference Publications