Skip to Main Content
We analyze a class of Markov decision processes with imperfect state information that evolve on an infinite time horizon and have a total cost criterion. In particular, we are interested in problems with stochastic shortest path structure, assuming the following: 1) the existence of a policy that guarantees termination with probability one and 2) the property that any policy that fails to guarantee termination has infinite expected cost from some initial state. We also assume that termination is perfectly recognized. In this paper, we clarify and expand upon arguments (given in an earlier paper) for establishing the existence, uniqueness, and characterization of stationary optimal policies, and the convergence of value and policy iteration. We also present an illustrative example, involving the search for a partially observed target that moves randomly on a grid, and we develop a simulation-based algorithm (based on neurodynamic programming techniques) for computing policies that approximately minimize the expected number of stages to complete the search.