Cart (Loading....) | Create Account
Close category search window
 

Algorithms for variance reduction in a policy-gradient based actor-critic framework

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Awate, Y.P. ; marketRx - A Cognizant Co., Gurgaon

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochastic-gradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix - the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the policy-gradient theorem with function approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

Published in:

Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on

Date of Conference:

March 30 2009-April 2 2009

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.