Journals & Magazines >IEEE Transactions on Artifici... >Volume: 5 Issue: 8

Memory Prompt for Spatiotemporal Transformer Visual Object Tracking

Download PDF
Download References
Request Permissions
Save to
Alerts

Impact Statement:Transformer structures are expert in delivering task-relevant clues via performing triplet mapping (query, key, and value) and nonlinear weighted aggregation. However, th...Show More

Abstract:

Recent transformer techniques have achieved promising performance boosts in visual object tracking, with their capability to exploit long-range dependencies among relevan...Show More

Metadata

Impact Statement:

Transformer structures are expert in delivering task-relevant clues via performing triplet mapping (query, key, and value) and nonlinear weighted aggregation. However, the computation burden of a transformer-based tracker depends on the involved tokens, which impedes the implementation of multitemplate temporal modeling. To this end, inspired by the recent prompt engineering studies, we propose to transmit the historical target appearance via efficient prompt learning, balancing the tracking needs on both spatiotemporal modeling and speed. A modified window-attention structure is also designed to cooperate with the prompt design, which constraints the self-attention calculation in a local window. The experimental analysis demonstrates the merit of the proposed approach, in delving spatiotemporal appearance modeling via the novelly prompt learning design.

Abstract:

Recent transformer techniques have achieved promising performance boosts in visual object tracking, with their capability to exploit long-range dependencies among relevant tokens. However, a long-range interaction can be achieved only at the expense of huge computation, which is proportional to the square of the number of tokens. This becomes particularly acute in online visual tracking with a memory bank containing multiple templates, which is a widely used strategy to address spatiotemporal template variations. We address this complexity problem by proposing a memory prompt tracker (MPTrack) that enables multitemplate aggregation and efficient interactions among relevant queries and clues. The memory prompt gathers any supporting context from the historical templates in the form of learnable token queries, producing a concise dynamic target representation. The extracted prompt tokens are then fed into a transformer encoder–decoder to inject the relevant clues into the instance, thus ...

Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 8, August 2024)

Page(s): 3759 - 3764

Date of Publication: 15 January 2024

Electronic ISSN: 2691-4581

DOI: 10.1109/TAI.2024.3353698

Funding Agency:

Contents

References is not available for this document.

Memory Prompt for Spatiotemporal Transformer Visual Object Tracking

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Memory Prompt for Spatiotemporal Transformer Visual Object Tracking

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?