Skip to Main Content
This paper proposes the novel application of an uncommonly rich feature representation to the domain of visual tracking. The proposed representation for tracking models both the spatial structure and dynamics of a target in a unified fashion, while simultaneously offering robustness to illumination variations. Specifically, the proposed feature is derived from spatiotemporal energy measurements that are computed by filtering in 3D, (x, y, t), image spacetime. These spatiotemporal energy measurements capture the underlying local spacetime orientation structure of the target across multiple scales. The breadth of applicability of these features within the field of visual tracking is demonstrated by their instantiation within three disparate tracking paradigms that are representative of the various basic types of region trackers in the field. Instantiation within these three tracking paradigms requires that the raw oriented energy measurements be post-processed using different methodologies that range from histogram accumulation to the identity transform. Qualitative and quantitative empirical evaluation on a challenging suite of videos demonstrates the strength and applicability of the proposed representation to tracking, as it outperforms other commonly-used features across all tracking paradigms. Moreover, it is shown that overall high tracking accuracy can be obtained with this proposed representation, as spatiotemporal oriented energy instantiations are shown to outperform several recent, state-of-the-art trackers.