By Topic

Approximate Robust Policy Iteration for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Parametric Transition Matrices

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Baohua Li ; Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-5706, USA. email: ; Jennie Si

We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

Published in:

2007 International Joint Conference on Neural Networks

Date of Conference:

12-17 Aug. 2007