Skip to Main Content
The simultaneous perturbation stochastic approximation (SPSA) is effective for the optimization problem of complex system which is difficult or impossible to directly obtain the gradient of the objective function except the measurements of objective function. SPSA relies on measurements of the objective function to estimate the gradient efficiently. In order to accelerate the convergence of SPSA, many improvements are proposed. The typical improvement is that the Newton-Raphson gradient approximation approach replaces first order gradient approximation of standard SPSA. Although the second order SPSA (2SPSA) algorithm solves the optimization problem successfully by efficient gradient approximation, the accuracy of the algorithm depends on the matrix conditioning of the objective function Hessian. In order to eliminate the influence caused by the objective function Hessian, this paper uses nonlinear conjugate gradient method to decide the search direction of the objective function. By synthesizing different nonlinear conjugate gradient methods, it ensures each search direction to be descensive. Besides the search direction improvement, this paper also improves the stepsize calculation method of SPSA. It calculates suitable stepsize based on the current and former gradient information. With the descensive search direction and appropriate stepsize, the improved SPSA converges faster than the 2SPSA. Through applying to reinforcement learning, the virtues of the improved SPSA are validated.