Skip to Main Content
This paper presents an enhanced least-squares approach for solving reinforcement learning control problems. Model-free least-squares policy iteration (LSPI) method has been successfully used for this learning domain. Although LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning, it faces challenging issues in terms of the selection of basis functions and training samples. Inspired by orthogonal least-squares regression (OLSR) method for selecting the centers of RBF neural network, we propose a new hybrid learning method. The suggested approach combines LSPI algorithm with OLSR strategy and uses simulation as a tool to guide the "feature processing" procedure. The results on the learning control of cart-pole system illustrate the effectiveness of the presented scheme.