In order to make full use of the advantages of both parametric and non-parametric models simultaneously, a kind of semi-parametric support vector machine (SVM) was proposed by combining a non-parametric SVM model and a parametric linear basis function model. The semi-parametric SVM was used to estimate the Q values of continuous-state-discontinuous-action pairs in an on-line manner so as to generalize a standard Q learning method to continuous state spaces. Simulation results concerning the balancing control problem of an inverted pendulum show that the proposed Q learning method has good adaptability for changes of system parameters and initial states, which provides a new approach to solve the generalization problem of continuous space of reinforcement learning.
Published in:
Control and Decision Conference (CCDC), 2010 Chinese
Date of Conference: 26-28 May 2010