Skip to Main Content
The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data.