Skip to Main Content
This paper addresses the problem of visual tracking under very general conditions: a possibly non-rigid target whose appearance may drastically change over time; general camera motion; a 3D scene; and no a priori information except initialization. This is in contrast to the vast majority of trackers which rely on some limited model in which, for example, the target's appearance is known a priori or restricted, the scene is planar, or a pan tilt zoom camera is used. Their goal is to achieve speed and robustness, but their limited context may cause them to fail in the more general case. The proposed tracker works by approximating, in each frame, a PDF (probability distribution function) of the target's bitmap and then estimating the maximum a posteriori bitmap. The PDF is marginalized over all possible motions per pixel, thus avoiding the stage in which optical flow is determined. This is an advantage over other general-context trackers that do not use the motion cue at all or rely on the error-prone calculation of optical flow. Using a Gibbs distribution with respect to the first-order neighborhood system yields a bitmap PDF whose maximization may be transformed into that of a quadratic pseudo-Boolean function, the maximum of which is approximated via a reduction to a maximum-flow problem. Many experiments were conducted to demonstrate that the tracker is able to track under the aforementioned general context.