By Topic

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Jun Ma ; Department of Operations Research and Financial Engineering, Princeton University, New Jersey, USA ; Warren B. Powell

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinite-horizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.

Published in:

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning

Date of Conference:

March 30 2009-April 2 2009