Skip to Main Content
Approximate policy iteration (API) has been shown to be a class of reinforcement learning methods with stability and sample efficiency. However, sample collection is still an open problem which is critical to the performance of API methods. In this paper, a novel adaptive sample collection strategy using active learning-based exploration is proposed to enhance the performance of kernel-based API. In this strategy, an online kernel-based least squares policy iteration (KLSPI) method is adopted to construct nonlinear features and approximate the Q-function simultaneously. Therefore, more representative samples can be obtained for value function approximation. Simulation results on typical learning control problems illustrate that by using the proposed strategy, the performance of KLSPI can be improved remarkably.