Skip to Main Content
Unlike dense stereo, optical flow or multi-view stereo, template-based tracking lacks benchmark datasets allowing a fair comparison between state-of-the-art algorithms. Until now, in order to evaluate objectively and quantitatively the performance and the robustness of template-based tracking algorithms, mainly synthetically generated image sequences were used. The evaluation is therefore often intrinsically biased. In this paper, we describe the process we carried out to perform the acquisition of real scene image sequences with very precise and accurate ground truth poses using an industrial camera rigidly mounted on the end-effector of a high-precision robotic measurement arm. For the acquisition, we considered most of the critical parameters that influence the tracking results such as: the texture richness and the texture repeatability of the objects to be tracked, the camera motion and speed, and the changes of the object scale in the images and variations of the lighting conditions over time. We designed an evaluation scheme for object detection and interframe tracking algorithms and used the image sequences to apply this scheme to several state-of-the-art algorithms. The image sequences will be made freely available for testing, submitting and evaluating new template-based tracking algorithms, i.e. algorithms that detect or track a planar object in an image sequence given only one image of the object (called the template).