Loading [MathJax]/extensions/MathMenu.js
Sparse Action Tube Detection | IEEE Journals & Magazine | IEEE Xplore

Abstract:

Action tube detection is a challenging task as it requires not only to locate action instances in each frame, but also link them in time. Existing action tube detection m...Show More

Abstract:

Action tube detection is a challenging task as it requires not only to locate action instances in each frame, but also link them in time. Existing action tube detection methods often employ multi-stage pipelines with complex designs and time-consuming linking procedure. In this paper, we present a simple end-to-end action tube detection method, termed as Sparse Tube Detector (STDet). Unlike those dense action detectors, our core idea is to use a set of learnable tube queries and directly decode them into action tubes (i.e., a set of tracked boxes with action label) from video content. This sparse detection paradigm shares several advantages. First, the large number of hand-crafted anchor candidates in dense action detectors is greatly reduced to a small number of learnable tubes, which results in a more efficient detection framework. Second, our learnable tube queries directly attend the whole video content, which endows our method with the capacity of capturing long-range information for action detection. Finally, our action detector is an end-to-end tube detection without requiring the linking procedure, which directly and explicitly predicts the action boundary instead of depending on the linking strategy. Extensive experiments shows that our STDet outperforms the previous state-of-the-art methods on two challenging untrimmed video action detection datasets of UCF101-24 and MultiSports. We hope our method will be an simple end-to-end tube detection baseline and can inspire new ideas in this direction.
Published in: IEEE Transactions on Image Processing ( Volume: 33)
Page(s): 1740 - 1752
Date of Publication: 04 March 2024

ISSN Information:

PubMed ID: 38437142

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.