By Topic

More on training strategies for critic and action neural networks in dual heuristic programming method

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Lendaris, G.G. ; Dept. of Syst. Sci. & Electr. Eng., Portland State Univ., OR, USA ; Paintz, C. ; Shannon, T.

The article describes a modification to the usual procedures for training of critic and action neural networks in the dual heuristic programming (DHP) method (D. Prokhorov and D. Wunsch, 1996; R. Santiago, 1995; P. Werbos, 1994). This modification entails updating both the critic and the action networks at each computational cycle, rather than only one at a time. The distinction lies in the introduction of a (real) second copy of the critic network whose weights are adjusted less often and the “desired value” for training the other critic is obtained from this critic copy. Previously (G. Lendaris and C. Paintz, 1997), the proposed modified training strategy was demonstrated on the pole cart controller problem: the full 6 dimensional state vector was input to the critic and action NNs, however, the utility function only involved pole angle, not distance along the track (x). For the first set of results presented here, the 3 states associated with the x variable were eliminated from the inputs to the NNs, keeping the same utility function previously defined. This resulted in improved learning and controller performance. From this point, the method is applied to two additional problems, each of increasing complexity: for the first, an x-related term is added to the utility function for the pole cart problem, and simultaneously, the x-related states were added back in to the NNs (i.e., increase number of state variables used from 3 to 6); the second relates to steering a vehicle with independent drive motors on each wheel. The problem contexts and experimental results are provided

Published in:

Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on  (Volume:4 )

Date of Conference:

12-15 Oct 1997