I. Introduction
In path planning and trajectory optimization problems, an autonomous robot may execute a task with limited or no knowledge of the future effects of its immediate actions [1]. For example, an Unmanned Aerial Vehicle (UAV) may wish to track, protect, or provide surveillance of a ground-based target. If the target trajectory is known, a deterministic optimization or control problem can be solved to give a feasible UAV trajectory. Our goal in this work is to develop a feedback control policy that allows a UAV to optimally maintain a nominal standoff distance from the target without full knowledge of the current target position or its future trajectory.