I. Introduction
Object tracking is a popular research topic in computer vision that aims to locate a determined object (initialized in the first frame) in each video frame. It has been widely used in many applications, such as intelligent surveillance, automatic driving, and unmanned aerial vehicles. Although it has already achieved great success in recent years with robust target representation brought by deep neural network [1]–[15], these trackers still suffer from challenging factors, e.g., illumination, scale variation, and fast motion.