Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Basis function adaptation methods for cost approximation in MDP

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Huizhen Yu ; Dept. of Comput. Sci., Univ. of Helsinki, Helsinki ; Bertsekas, D.P.

We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.

Published in:

Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on

Date of Conference:

March 30 2009-April 2 2009