Skip to Main Content
Automated video tracking is useful in a number of applications such as surveillance, multisensor networks, robotics and virtual reality. In this paper we investigate an approach to tracking based on fusing the output of a collection of video trackers, each attending to a different feature or cue on the target. We show both theoretically and experimentally that the method used to prune the growth of target hypotheses can have a great impact on the trackers performance, and indirectly, change the benefit of using linear score combination as opposed to a non-linear rank combination for fusion. We also show that the rank-score graph defined by Hsu and Taksa can be used to select a subset of features to fuse to reduce classification error.