The dual heuristic programming (DHP) approach has a superior ability for solving approximate dynamic programming problems in adaptive critic designs (ACD). The common approaches applied in the DHP are design the multilayer feedforward neural networks (MLFNN) as the differential model of the plant for training the critic and action networks. However, the problems of overfitting and premature convergence to local optima usually pose great challenges in the practice of MLFNNs during the training procedure. In this paper a least squares support vector machine (LS-SVM) regressor optimized by particle swarm algorithm (PSO) is proposed for generating the control actions and the learning rules for the critic and action networks. PSO is introduced to select the LS-SVM's hyper-parameters. The introduction of the SVM based training mechanism imparts the developed algorithm with inherent capacity for combating the overfitting problem as well as showing relatively high efficiency in converging to the optima. Simulation on the balancing of a cart pole plant shows that the proposed learning strategy is verified as faster convergence and higher efficiency as compared to traditional BP based adaptive dynamic programming approaches.