Skip to Main Content
This paper proposes a multi-objective reinforcement learning algorithm (MORLA) and uses simultaneous perturbation stochastic approximation (SPSA) to improve the convergence of it. Usually, reinforcement learning (RL) is used to design neurocontroller for control system with single objective. When facing multi-objective system, it is necessary to design the neurocontroller according to the personal preference. The MORLA can transform the multi-objective into synthetical objective and applies parallel genetic algorithm (PGA) to evolve the neurocontroller according to the synthetical objective. To establish the synthetical objective, the objective weight which represents the personal preference is calculated by solving the constrained optimization problem (COP) at the end of each generation. The COP requires not only the biggest variance of the synthetical objective in the population, but also requires the weight to fit the designer's preference. After acquiring the weights, the PGA can select the elitists from the population according to the designer's preference and design a satisfying neurocontroller by evolutionary operations. In addition, although GA has good global search ability, it descends slowly at local area. This paper applies SPSA algorithm to search optimal solution when GA is vibrating at local area. The SPSA converges fast by efficient gradient approximation that relies on measurements of the objective function. The hybrid algorithm accelerates the learning speed of reinforcement learning. At last, the MORLA is used to design neurocontroller for a speed-controlled induction motor drive with indirect vector control. With different personal preferences for the drive system, the simulation results show the feasibility and validity of the MORLA.