Abstract:
Convolutional neural networks (CNN) based trackers have been widely employed in visual object tracking due to their powerful representations. Features from different CNN ...Show MoreMetadata
Abstract:
Convolutional neural networks (CNN) based trackers have been widely employed in visual object tracking due to their powerful representations. Features from different CNN layers encode different information. Deeper layers contain more semantic information, while the resolution is too coarse to localize the target. Shallower layers carry more detail information but are less robust for appearance variations. In this paper, we propose an algorithm which incorporates the Spatial and Temporal attention to take full advantage of the Hierarchical Convolutional Features for Tracking (STHCFT). We firstly learn correlation filters on each convolutional layer. Based on the spatial attention inspired by the paraventricular thalamus (PVT) in the brain, we choose the most important layer to build the base response, and the others to be the auxiliary responses. In addition, we make full use of the temporal attention to determine the weights of the auxiliary responses. Finally, the target is located by the maximum value of the fused responses. Extensive experimental results on the benchmark OTB-2013 and OTB-2015 have shown the proposed algorithm performs favorably against several state-of-the-art trackers.
Date of Conference: 14-19 July 2019
Date Added to IEEE Xplore: 30 September 2019
ISBN Information: