By Topic

A modified actor-critic reinforcement learning algorithm

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Mustapha, S.M. ; Dept. of Electr. Eng. & Comput. Eng., Sherbrooke Univ., Que., Canada ; Lachiver, G.

This paper proposes a fast and efficient actor-critic reinforcement learning algorithm that is novel in at least two ways: it updates the critic only when the best action is executed and it takes full advantage of the powerful temporal difference (TD) prediction method to train a continuous-valued actor. Both actor and critic are represented separately by two adaptive neural fuzzy systems tuned by a backpropagation algorithm. While the critic adapts to the actor by minimizing the quadratic sum of TD error, the actor adapts to the critic, by not only using the TD error, but also by using the state value function. The new actor-critic architecture is applied to an inverted pendulum system, which is widely used to compare reinforcement learning architectures

Published in:

Electrical and Computer Engineering, 2000 Canadian Conference on  (Volume:2 )

Date of Conference: