By Topic

Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Wiering, M.A. ; Dept. of Inf. & Comput. Sci., Utrecht Univ. ; van Hasselt, H.

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

Published in:

Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on

Date of Conference:

1-5 April 2007