By Topic

Active exploration by searching for experiments that falsify the computed control policy

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Fonteneau, R. ; Dept. of Electr. Eng. & Comput. Sci., Univ. of Liege, Liège, Belgium ; Murphy, S.A. ; Wehenkel, L. ; Ernst, D.

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising.

Published in:

Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on

Date of Conference:

11-15 April 2011