By Topic

Convergence accelerated by the improvements of stepsize and gradient in SPSA

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Zhang Huajun ; Dept. of Control Sci. & Eng., Huazhong Univ. of Sci. & Technol. (HUST), Wuhan, China ; Zhao Jin ; Geng Tao

The simultaneous perturbation stochastic approximation (SPSA) is effective for the optimization problem of complex system which is difficult or impossible to directly obtain the gradient of the objective function except the measurements of objective function. SPSA relies on measurements of the objective function to estimate the gradient efficiently. In order to accelerate the convergence of SPSA, many improvements are proposed. The typical improvement is that the Newton-Raphson gradient approximation approach replaces first order gradient approximation of standard SPSA. Although the second order SPSA (2SPSA) algorithm solves the optimization problem successfully by efficient gradient approximation, the accuracy of the algorithm depends on the matrix conditioning of the objective function Hessian. In order to eliminate the influence caused by the objective function Hessian, this paper uses nonlinear conjugate gradient method to decide the search direction of the objective function. By synthesizing different nonlinear conjugate gradient methods, it ensures each search direction to be descensive. Besides the search direction improvement, this paper also improves the stepsize calculation method of SPSA. It calculates suitable stepsize based on the current and former gradient information. With the descensive search direction and appropriate stepsize, the improved SPSA converges faster than the 2SPSA. Through applying to reinforcement learning, the virtues of the improved SPSA are validated.

Published in:

Control and Decision Conference (CCDC), 2011 Chinese

Date of Conference:

23-25 May 2011