By Topic

A reinforcement learning based algorithm for Markov decision processes

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Bhatnagar, S. ; Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India ; Kumar, S.

A variant of a recently proposed two-timescale reinforcement learning based actor-critic algorithm for infinite horizon discounted cost Markov decision processes with finite state and compact action spaces is proposed. On the faster timescale, the value function corresponding to a given stationary deterministic policy is updated and averaged while the policy itself is updated on the slower scale. The latter recursion uses the sign of the gradient estimate instead of the estimate itself. A potential advantage in the use of sign function lies in significantly reduced computation and communication overheads in applications such as congestion control in communication networks and distributed computation. Convergence analysis of the algorithm is briefly sketched and numerical experiments for a problem of congestion control are presented.

Published in:

Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on

Date of Conference:

4-7 Jan. 2005