By Topic

Optimal trade-off between exploration and exploitation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Simpkins, A. ; Dept. of Mech. & Aerosp. Eng., Univ. of California, San Diego, La Jolla, CA ; de Callafon, R. ; Todorov, E.

Control in an uncertain environment often involves a trade-off between exploratory actions, whose goal is to gather sensory information, and "regular" actions which exploit the information gathered so far and pursue the task objectives. In principle both types of action can be modeled by minimizing a single cost function within the framework of stochastic optimal control. In practice however this is difficult, because the control law must be sensitive to estimation uncertainty which violates the certainty-equivalence principle. In this paper we formalize the problem in a way which captures the essence of the exploration-exploitation trade-off and yet is amenable to numerical methods for optimal control. The key to our approach is augmenting the dynamics of the partially-observable plant with the Kalman filter dynamics, thus obtaining a higher-dimensional but fully-observable plant. The resulting control laws compare favorably to other more ad-hoc approaches. Our formalism is also suitable for modeling human behavior in tasks which benefit from active exploration.

Published in:

American Control Conference, 2008

Date of Conference:

11-13 June 2008