By Topic

Multi-objective reinforcement learning algorithm and its improved convergency method

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Zhao Jin ; Dept. of Control Sci. & Eng., Huazhong Univ. of Sci. & Technol., Wuhan, China ; Zhang Huajun

This paper proposes a multi-objective reinforcement learning algorithm (MORLA) and uses simultaneous perturbation stochastic approximation (SPSA) to improve the convergence of it. Usually, reinforcement learning (RL) is used to design neurocontroller for control system with single objective. When facing multi-objective system, it is necessary to design the neurocontroller according to the personal preference. The MORLA can transform the multi-objective into synthetical objective and applies parallel genetic algorithm (PGA) to evolve the neurocontroller according to the synthetical objective. To establish the synthetical objective, the objective weight which represents the personal preference is calculated by solving the constrained optimization problem (COP) at the end of each generation. The COP requires not only the biggest variance of the synthetical objective in the population, but also requires the weight to fit the designer's preference. After acquiring the weights, the PGA can select the elitists from the population according to the designer's preference and design a satisfying neurocontroller by evolutionary operations. In addition, although GA has good global search ability, it descends slowly at local area. This paper applies SPSA algorithm to search optimal solution when GA is vibrating at local area. The SPSA converges fast by efficient gradient approximation that relies on measurements of the objective function. The hybrid algorithm accelerates the learning speed of reinforcement learning. At last, the MORLA is used to design neurocontroller for a speed-controlled induction motor drive with indirect vector control. With different personal preferences for the drive system, the simulation results show the feasibility and validity of the MORLA.

Published in:

Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on

Date of Conference:

21-23 June 2011