Scheduled System Maintenance on December 17th, 2014:
IEEE Xplore will be upgraded between 2:00 and 5:00 PM EST (18:00 - 21:00) UTC. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Lewis, F.L. ; Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA ; Vamvoudakis, K.G.

Optimal feedback controllers are generally computed offline assuming full knowledge of the system dynamics. Adaptive controllers, on the other hand, are online schemes that effectively learn to compensate for unknown system dynamics and disturbances. Generally, direct adaptive schemes do not converge to optimal control solutions for user-prescribed performance measures. During the past years, it has been shown that reinforcement learning techniques from computational intelligence can be used to learn optimal feedback controllers online using direct adaptive control techniques without knowing the system dynamics. Most reinforcement learning methods require full measurements of the system internal state. In this paper we develop reinforcement learning methods which require only output feedback and yet converge to an optimal controller. Deterministic linear time-invariant systems are considered. Both policy iteration (PI) and value iteration (VI) algorithms are derived. This corresponds to optimal control for a class of partially observable Markov decision processes (POMDPs). It is shown that, similar to Q-learning, the new output-feedback optimal learning methods have the important advantage that knowledge of the system dynamics is not needed for their implementation. Only the order of the system must be known and an upper bound on its ‘observability index’. The learned output feedback controller is in the form of a polynomial ARMA controller that has equivalent performance with the optimal state variable feedback gain.

Published in:

Control and Automation (ICCA), 2010 8th IEEE International Conference on

Date of Conference:

9-11 June 2010