Loading [MathJax]/extensions/MathMenu.js
Memory Prompt for Spatiotemporal Transformer Visual Object Tracking | IEEE Journals & Magazine | IEEE Xplore

Memory Prompt for Spatiotemporal Transformer Visual Object Tracking


Impact Statement:Transformer structures are expert in delivering task-relevant clues via performing triplet mapping (query, key, and value) and nonlinear weighted aggregation. However, th...Show More

Abstract:

Recent transformer techniques have achieved promising performance boosts in visual object tracking, with their capability to exploit long-range dependencies among relevan...Show More
Impact Statement:
Transformer structures are expert in delivering task-relevant clues via performing triplet mapping (query, key, and value) and nonlinear weighted aggregation. However, the computation burden of a transformer-based tracker depends on the involved tokens, which impedes the implementation of multitemplate temporal modeling. To this end, inspired by the recent prompt engineering studies, we propose to transmit the historical target appearance via efficient prompt learning, balancing the tracking needs on both spatiotemporal modeling and speed. A modified window-attention structure is also designed to cooperate with the prompt design, which constraints the self-attention calculation in a local window. The experimental analysis demonstrates the merit of the proposed approach, in delving spatiotemporal appearance modeling via the novelly prompt learning design.

Abstract:

Recent transformer techniques have achieved promising performance boosts in visual object tracking, with their capability to exploit long-range dependencies among relevant tokens. However, a long-range interaction can be achieved only at the expense of huge computation, which is proportional to the square of the number of tokens. This becomes particularly acute in online visual tracking with a memory bank containing multiple templates, which is a widely used strategy to address spatiotemporal template variations. We address this complexity problem by proposing a memory prompt tracker (MPTrack) that enables multitemplate aggregation and efficient interactions among relevant queries and clues. The memory prompt gathers any supporting context from the historical templates in the form of learnable token queries, producing a concise dynamic target representation. The extracted prompt tokens are then fed into a transformer encoder–decoder to inject the relevant clues into the instance, thus ...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 8, August 2024)
Page(s): 3759 - 3764
Date of Publication: 15 January 2024
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.