By Topic

Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Xi-Ren Cao ; Hong Kong Univ. of Sci. & Technol. ; Xianping Guo

In a partially observable Markov decision process (POMDP), if the reward can be observed at each step, then the observed reward history contains information on the unknown state. This information, in addition to the information contained in the observation history, can be used to update the state probability distribution. The policy thus obtained is called a reward-information policy (RI-policy); an optimal RI-policy performs no worse than any normal optimal policy depending only on the observation history. The above observation leads to four different problem-formulations for POMDPs depending on whether the reward function is known and whether the reward at each step is observable. This exploratory work may attract attention to these interesting problems

Published in:

Automatic Control, IEEE Transactions on  (Volume:52 ,  Issue: 4 )