By Topic

Partially Observed Stochastic Shortest Path Problems With Approximate Solution by Neurodynamic Programming

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Patek, S.D. ; Virginia Univ., Charlottesville

We analyze a class of Markov decision processes with imperfect state information that evolve on an infinite time horizon and have a total cost criterion. In particular, we are interested in problems with stochastic shortest path structure, assuming the following: 1) the existence of a policy that guarantees termination with probability one and 2) the property that any policy that fails to guarantee termination has infinite expected cost from some initial state. We also assume that termination is perfectly recognized. In this paper, we clarify and expand upon arguments (given in an earlier paper) for establishing the existence, uniqueness, and characterization of stationary optimal policies, and the convergence of value and policy iteration. We also present an illustrative example, involving the search for a partially observed target that moves randomly on a grid, and we develop a simulation-based algorithm (based on neurodynamic programming techniques) for computing policies that approximately minimize the expected number of stages to complete the search.

Published in:

Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on  (Volume:37 ,  Issue: 5 )